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Introduction 


Chance  plays  an  important  part  in  all  aspects  of  life.  We  take  chances  every  day: 
whether  we  catch  the  bus  or  just  miss  it;  whether  or  not  we  are  caught  in  a 
sudden  shower;  whether  or  not  we  are  involved  in  an  accident;  whether  the  shot  at 
goal  just  lands  in  the  net  or  just  misses.  Chance  or  random  variation  is  also  an 
essential  feature  of  almost  all  working  systems:  a  scientist  taking  measurements  in 
a  laboratory;  an  economist  studying  price  fluctuations;  a  surgeon  studying 
heartbeat  patterns  on  an  electrocardiogram.  In  all  these  processes,  some  elements 
of  chance  or  randomness  are  present. 

This  course  deals  with  some  of  these  random  phenomena,  the  emphasis  being  on 
modelling  and  problem-solving.  A  practical  situation  is  described  and  then  a 
probability  model  is  developed  to  describe  the  main  features.  Usually  the  model  is 
of  a  simplified  version  of  the  actual  process.  The  model  is  then  analysed 
mathematically  in  order  to  discover  the  possible  ways  in  which  the  situation 
might  develop,  and  the  probabilities  associated  with  them. 

One  example  is  the  spread  of  an  infectious  disease.  Suppose  that  one  child  in  a 
class  catches  German  measles.  What  are  the  chances  that  this  will  spread  to  most 
of  the  class  and  so  ruin  the  school  play?  Is  this  almost  certain  to  happen,  or  is  it 
quite  likely  that  only  one  or  two  other  children  will  catch  it?  And  what  about  the 
child’s  three  brothers  and  sisters?  Will  they  all  catch  German  measles  at  the  same 
time,  or  one  after  the  other,  or  not  at  all?  In  the  case  of  an  influenza  virus,  how 
likely  is  this  infection  to  develop  into  a  national  epidemic  with  thousands  of 
people  infected  and  many  deaths? 

Hereditary  titles  in  the  UK,  such  as  dukedoms,  normally  pass  on  the  death  of  the 
holder  to  the  closest  male  relative.  What  is  the  chance  that  there  will  be  no 
surviving  relative  and  so  the  title  becomes  extinct?  A  model  for  this  situation  is 
developed  in  the  course,  and  it  is  shown  how  various  probabilities  can  be  derived 
from  the  model. 

Many  population  traits,  such  as  eye  colour  or  colour  blindness,  are  inherited,  as 
are  some  diseases  such  as  haemophilia  or  sickle-cell  anaemia.  Some  of  the  basic 
theory  of  mathematical  genetics  is  developed  in  the  course,  and  answers  are 
derived  to  such  questions  as:  ‘How  likely  is  a  sufferer  from  sickle-cell  anaemia  to 
pass  it  on  to  his/her  children?’  and  ‘If  parents  with  normal  vision  have  a  child 
who  is  colour-blind,  what  are  the  chances  that  further  children  will  also  be 
colour-blind?’ 

A  more  complex  example  of  a  real-life  situation  is  a  busy  airport  like  Heathrow, 
where  the  timetable  is  planned  to  a  strict  schedule.  Aircraft  take  off  and  land 
every  two  or  three  minutes  during  the  day,  so  the  runways  are  continually  in  use. 
This  would  be  relatively  simple  if  all  planes  could  keep  exactly  to  schedule,  but 
certain  things  can  go  wrong  purely  by  chance.  For  example,  planes  may  arrive 
early  due  to  following  winds,  or  late  due  to  head  winds  or  delays  in  starting  the 
journey.  Planes  may  not  be  ready  for  take-off  on  time  because  of  mechanical 
faults  or  delays  in  loading  or  refuelling.  Air  traffic  control  then  have  the  task  of 
rescheduling.  Can  an  aircraft  be  fitted  in  for  a  late  take-off?  How  long  can 
aircraft  be  stacked,  waiting  to  land?  Are  they  carrying  sufficient  fuel,  or  should 
they  be  diverted  to  another  airport? 

These  problems  are  typical  of  many  practical  situations,  though  the  model  for  an 
airport  would  be  much  more  complex  than  those  covered  in  this  course.  One 
aspect  of  the  airport  problem  is  that  it  is  developing  over  time;  for  example,  a 
delay  to  one  plane  can  have  repercussions  for  many  hours.  Many  of  the  models 
that  are  studied  in  this  course  involve  the  development  of  a  process  over  time. 

All  the  practical  situations  that  will  be  studied  in  this  course  contain  some 
element  of  chance.  This  unit  contains  a  summary  of  basic  results  and  ideas 
concerning  probability  and  random  variables.  It  will  be  assumed  that  you  are 
familiar  with  these  throughout  the  remainder  of  the  course,  so  it  is  important  for 


you  to  study  this  unit  thoroughly  now.  Exercises  are  included  in  all  sections,  but 
at  some  stage  you  will  almost  certainly  find  that  you  need  more  practice  in  order 
to  familiarize  yourself  with  a  particular  idea  or  technique.  You  will  find  plenty  of 
extra  exercises  on  all  the  topics  covered  in  this  unit  in  the  Problem  Booklet  for 
Units  1  and  2.  You  are  strongly  urged  to  make  use  of  this  resource  to  consolidate 
your  work  on  this  unit.  You  will  also  find  a  few  questions  on  the  material  covered 
in  this  unit  in  Unit  16  (namely  Questions  1.1-1. 7  in  the  section  headed  Unit  1). 

In  Section  1  some  of  the  fundamental  ideas  and  results  of  probability  theory  are 
discussed.  There  are  a  number  of  important  results  in  this  section  which  will  be 
used  time  and  again  in  the  course. 

Sections  2  and  3  are  concerned  with  discrete  random  variables.  If  you  have 
studied  M2^6  Elements  of  Statistics  then  you  will  find  that  much  of  the  material 
in  Section  2  is  familiar  and  that,  even  though  this  section  is  a  long  one,  you  can 
complete  it  fairly  quickly.  Section  3  introduces  probability  generating  functions. 
As  you  will  see,  their  use  often  simplifies  calculations  involving  discrete  random 
variables.  This  is  an  important  section  as  probability  generating  functions  are 
used  frequently  in  the  course  from  Unit  4  onwards. 

Sections  4  and  5  are  devoted  to  the  study  of  continuous  random  variables. 

General  techniques  and  results  are  discussed  in  Section  4  and  some  specific 
distributions  and  their  properties  are  described  in  Section  5. 

On  many  occasions  in  this  course,  a  model  is  developed  for  some  random 
phenomenon.  The  question  arises  as  to  whether  the  model  is  a  reasonable  one. 
One  approach  is  to  use  simulation  to  obtain  some  idea  of  the  sort  of  behaviour  the 
model  predicts.  The  simulation  of  observations  from  discrete  and  continuous 
distributions  is  discussed  in  Section  6. 

The  main  purpose  of  this  unit  is  to  provide  you  with  a  summary  of  the  main 
ideas,  techniques  and  results  about  probability  and  random  variables  that  you  will 
need  as  you  study  the  rest  of  the  course.  A  secondary  aim  is  to  enable  you  to 
familiarize  yourself  with  the  recommended  set  of  tables  for  the  course:  Neave , 
Elementary  Statistical  Tables.  These  tables  are  used  in  the  text  throughout  the 
course,  and  you  will  be  expected  to  use  them  where  appropriate  for  assignments 
and  for  the  examination. 

There  are  no  audio  or  video  components  associated  with  this  unit. 
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1  Probability 


Some  knowledge  of  probability  and  its  rules  is  essential  in  order  to  understand  the 
development  and  analysis  of  models  for  random  phenomena.  This  section  contains 
a  brief  introduction  to  probability  theory.  In  Subsection  1.1,  the  language  and 
notation  of  probability  are  introduced  and  some  simple  rules  obtained.  In 
Subsection  1.2,  the  idea  of  conditional  probability  is  introduced  and  some  further 
rules  are  derived.  And  in  Subsection  1.3,  we  derive  an  important  result  which  is 
used  on  many  occasions  during  the  course:  the  Theorem  of  Total  Probability. 


1.1  Basics 

Many  situations  contain  some  element  of  chance:  the  toss  of  a  coin  may  result  in 
a  head  or  a  tail,  the  roll  of  a  die  may  result  in  a  score  of  1,  2,  3,  4,  5  or  6,  and  so 
on;  but  in  each  case  it  is  impossible  to  predict  with  certainty  what  the  result  will 
be.  In  many  practical  situations,  we  wish  to  quantify  the  chance  that  some 
particular  event  will  occur;  for  instance,  a  couple  may  want  to  know  the  chance 
that  their  baby  will  inherit  a  particular  genetic  disease,  or  you  might  want  to 
know  the  chance  that  you  will  have  to  wait  more  than  ten  minutes  to  be  served 
when  you  go  to  the  bank. 

The  probability  of  an  event  is  a  number  that  quantifies  the  chance  that  the  event 
occurs.  It  can  be  defined  in  a  number  of  equivalent  ways.  For  instance,  the 
probability  of  an  event  A,  which  is  denoted  P(A),  may  be  defined  to  be  the 
long-run  proportion  of  occasions  on  which  the  event  A  occurs.  For  example,  if  we 
were  to  toss  a  coin  repeatedly,  we  would  find  that  the  proportion  of  tosses  that 
result  in  a  head  approaches  \  as  the  number  of  tosses  becomes  large.  In  the  long 
run,  half  the  tosses  would  result  in  a  head,  so  the  probability  of  a  head  in  a  single 
toss  is 

Several  properties  of  probabilities  can  be  deduced  from  this  definition.  First, 
since  the  probability  of  an  event  A  is  a  proportion,  it  must  be  a  number  between 
0  and  1:  0  <  P(A)  <1.  An  impossible  event  never  occurs,  so  its  probability  is  0; 
and  a  certain  event  always  occurs,  so  its  probability  is  1. 

There  are  various  rules  that  are  of  assistance  in  calculating  probabilities.  One 
rule,  which  can  be  deduced  from  the  definition  above,  concerns  the  complement 
of  an  event  A,  written  A;  A  is  the  event  ‘not-A\  and  P(A)  is  the  probability  that 
A  does  not  occur.  Clearly,  the  proportion  of  the  time  that  A  does  not  occur  is 
equal  to  one  minus  the  proportion  of  the  time  that  it  does,  so 


P(A)  =  1-P(A). 

(1.1) 

Another  rule  concerns  the  probability  that  one  or  other  of  two  mutually  exclusive 
events  occurs:  two  events  A  and  B  are  mutually  exclusive  if  it  is  impossible  for 
them  to  occur  simultaneously,  that  is  if  P(A  fl  B)  =  0.  In  this  case,  the 
proportion  of  the  time  that  either  A  or  B  occurs  is  equal  to  the  proportion  of  the 
time  that  A  occurs  plus  the  proportion  of  the  time  that  B  occurs;  that  is, 


for  mutually  exclusive  events  A  and  B , 

P(A  U5)  =  P(A)  +  P{B). 

(1.2) 

This  approach  to  probability  is  an  intuitive  one,  but  it  would  be  cumbersome  to 
argue  in  terms  of  long-run  proportions  every  time  we  need  to  calculate  a 
probability;  and  it  would  usually  be  impracticable  to  estimate  probabilities  by 
carrying  out  repeated  experiments.  In  practice  a  more  theoretical  approach  to 
calculating  probabilities  is  needed. 


The  symbol  fl  is  mathematical 
shorthand  for  the  word  ‘and’. 


The  symbol  U  is  mathematical 
shorthand  for  the  word  ‘or’  (and  in 
general  covers  ‘A,  or  B ,  or  both’). 
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In  this  section,  we  consider  simple  situations  in  which  some  experiment  or  trial  is 
performed  and  all  the  possible  outcomes  can  be  specified.  It  is  certain  that  just 
one  outcome  will  occur,  but  it  is  impossible  to  predict  with  certainty  which  one. 

In  many  simple  experiments,  such  as  selecting  a  card,  the  outcomes  can  be 
assumed  to  be  equally  likely  or  equiprobable;  if  there  are  N  possible 
•  equiprobable  outcomes  of  an  experiment,  then  each  has  a  probability  of  1/N.  For 
example,  when  tossing  a  fair  coin,  the  probability  of  a  head  is  equal  to  one  half: 
P(h)  =  i.  And  when  rolling  a  die,  the  probability  of  a  4  is  P( 4)  = 

Rule  (1.1)  can  be  used  to  deduce  the  probability  of  not  obtaining  a  4  when  rolling 
a  die: 


P(not-4)  =  l-P(4)  =  l-±  =  §. 

And  using  Rule  (1.2)  we  can  deduce  the  probability  of  obtaining  either  a  3  or  a  4: 

P(3U4)  =  P(3)  +  P(4)  =  1  +  I  =  §  =  |. 

Rule  (1.2)  extends  in  an  obvious  way  to  three  or  more  mutually  exclusive  events. 
So  the  probability  of  a  score  greater  than  2  is 

P(3U4U5U6)  =  l  +  J  +  I  +  i  =  §• 

Question  1.1  Write  down  the  probability  of  each  of  the  following  events. 
Assume  in  each  case  that  the  possible  outcomes  are  equiprobable. 

(i)  A  die  shows  a  6. 

(ii)  A  die  shows  an  even  number. 

(iii)  A  card  selected  from  a  pack  is  a  heart. 

(iv)  A  card  selected  from  a  pack  is  not  a  heart. 

(v)  A  card  selected  from  a  pack  is  a  court  card  (king,  queen  or  jack). 

(vi)  A  card  selected  from  a  pack  is  both  a  heart  and  a  court  card. 

(vii)  A  card  selected  from  a  pack  is  either  a  heart  or  a  court  card  (or  both). 

(viii)  A  card  selected  from  a  pack  is  neither  a  heart  nor  a  court  card.  □ 


This  means  the  probability 
P(3  or  4). 


This  means  the  probability 
P(3  or  4  or  5  or  6). 


There  are  various  rules  that  are  of  assistance  when  calculating  probabilities.  Two 
of  them,  given  by  (1.1)  and  (1.2),  have  already  been  discussed. 

Rule  (1.2)  specifies  the  probability  of  the  occurrence  of  a  least  one  of  two 
mutually  exclusive  events  A  and  B.  A  third  rule  specifies  the  probability  of  the 
occurrence  of  at  least  one  of  any  two  events  A  and  P;  the  rule  holds  whether  or 
not  A  and  P  are  mutually  exclusive. 

Suppose,  for  the  moment,  that  A  is  the  event  that  a  card  selected  from  a  pack  is  a 
heart  and  B  is  the  event  that  it  is  a  court  card.  These  events  will  be  used  to 
illustrate  the  rule.  In  parts  (iii)  and  (v)-(vii)  of  Question  1.1,  you  found  the 
probabilities  P(A),  P(B),  P{A  fl  B)  and  P(A  U  B)  by  counting  equiprobable 
outcomes.  In  a  pack  of  52  cards,  there  are  13  hearts,  12  court  cards,  3  court  cards 
which  are  also  hearts  and  22  cards  which  are  either  court  cards  or  hearts  or  both. 
This  information  is  contained  in  the  Venn  diagram  in  Figure  1.1.  So 

■P(^)=i.  P(B)=§,  P(inB)  =  i  and  P(AU  B)  =  §. 

Notice  that  in  this  example  A  and  B  are  not  mutually  exclusive  events — A  and  B 
can  occur  simultaneously— so  Rule  (1.2)  does  not  apply.  The  probability 
P{A  U  B )  cannot  be  found  simply  by  adding  together  P(A)  and  P(P).  As  the 
diagram  shows,  if  you  simply  add  together  the  number  of  outcomes  in  A  and  the 
number  in  P,  then  you  count  the  outcomes  in  A  n  P  twice.  So  to  obtain  the 
number  of  outcomes  in  A  U  P,  the  number  indflP  must  be  subtracted  from  the 
sum  of  the  number  in  A  and  the  number  in  P.  Correspondingly,  to  find  P(A  U  P), 
P{A  n  P)  must  be  subtracted  from  P(A)  +  P(P).  This  gives  the  Addition  Law: 


P{A  UP)  =  P(A)  +  P(P)  -  P(A  n  P). 


(1.3) 
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This  result  holds  for  any  two  events  A  and  B,  not  just  the  two  in  the  example 
above.  When  A  and  B  are  mutually  exclusive  events.  A  n  B  =  0,  so 
P{A  I1B)  =  0  and  (1.3)  reduces  to  (1.2).  So  Rule  (1.2)  is  just  a  special  case  of  the 
Addition  Law  (1.3). 

Question  1.2  Given  P(A)  =  0.5.  P(B)  =  0.4  and  P(A  n  B)  =  0.1,  use 
Rules  (1.1)  and  (1.3)  to  find  the  following  probabilities. 

(i)  P(A)  (ii)  P(B)  (hi)  P(AuB)  (iv)  P(AnB)  (v)  P(AnP) 
(Hint:  You  may  find  a  Venn  diagram  helpful  for  parts  (iv)  and  (v).)  □ 

This  course  is  about  applying  ideas  and  rules  of  probability  to  various  models  for 
random  phenomena.  We  shall  not  be  concerned  with  the  fine  details  involved  in 
deriving  the  rules  and  the  techniques  we  use.  This  would  require  a  more 
mathematical  approach  to  probability.  Such  an  approach  would  make  greater  use 
of  set  notation  than  we  have  done.  One  such  approach  involves  starting  with  a 
small  set  of  rules,  called  the  Axioms  of  Probability,  and  proving  all  subsequent 
rules  assuming  only  these  axioms.  Here  is  how  probability  might  be  introduced 
using  this  approach. 

The  set  H  of  all  possible  outcomes  of  an  experiment  is  called  the  event  space. 
(For  instance,  if  the  experiment  is  rolling  a  die  the  event  space  is 
H  =  {1,  2,  3,  4,  5,  6}.)  An  event  A  is  defined  as  a  subset  of  the  event  space  O;  so 
A  C  Cl.  An  event  may  be  a  single  outcome,  perhaps  selecting  the  six  of  hearts 
from  a  pack  or  observing  that  a  queue  contains  exactly  three  people.  It  may, 
however,  contain  several  outcomes,  perhaps  selecting  any  heart  or  observing  that 
the  queue  contains  more  than  three  people.  Associated  with  every  possible  event 
A  is  a  number  P(A)  that  is  called  the  probability  of  the  (occurrence  of  the) 
event  A.  A  probability  satisfies  three  axioms,  as  follows. 


Axiom  I 

0  <  P{A)  <  1  for  every  event  A. 

Axiom  II 

P(Q)  =  1. 

Axiom  III 

If  Aj  fl  Aj  =  0  for  all  i,  j,  then 

n 

P(Ai  UA2U...UAn)  =  Jj  P(At). 

i—1 

The  first  axiom  states  that  a  probability  is  a  number  lying  between  0  and  1 
(inclusive).  Since  the  event  space  Cl  contains  all  possible  outcomes,  the  second 
axiom  expresses  the  fact  that  one  or  other  of  the  outcomes  must  occur.  The 
statement  A*  n  Aj  =  0  says  that  it  is  impossible  for  A*  and  Aj  to  occur  together: 
that  is,  Aj  and  Aj  are  mutually  exclusive  events.  So  the  third  axiom  states  that 
the  probability  that  one  or  other  of  a  set  of  mutually  exclusive  events  will  occur  is 
equal  to  the  sum  of  the  separate  probabilities  of  the  events. 

These  axioms  state  properties  and  results  that  we  obtained  earlier  using  a  more 
practical  and  intuitive  approach  to  probability.  They  are  stated  in  the  Handbook 
together  with  Rules  (1.1)  and  (1.3)  and  a  number  of  other  rules  that  are  discussed 
in  the  next  subsection. 


The  symbol  0  is  read  ‘the  empty 
set’,  meaning  A  and  B  have  no 
common  outcomes  at  all:  this  is 
what  ‘mutually  exclusive’  means. 


The  words  in  parentheses  are  oftei: 
omitted. 
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1.2  Dependent  events  and  conditional  probability 

Frequently  the  probability  that  an  event  occurs  is  dependent  on  whether  or  not 
another  event  occurs.  Consider,  for  instance,  the  following  scenario.  A  die  is 
rolled.  The  event  A  occurs  if  a  score  of  3  or  less  is  obtained  and  the  event  E 
0  occurs  if  an  even  number  is  obtained;  then  P(A)  =  J  and  P(E)  =  However, 
suppose  it  is  known  whether  an  even  number  or  an  odd  number  has  been 
rolled— that  is,  whether  E  or  E  has  occurred.  The  probability  that  A  occurs  is 
now  dependent  on  whether  or  not  E  has  occurred:  if  it  is  known  that  the  score  is 
even  (E),  then  there  are  three  possible  outcomes,  one  of  which  is  3  or  less,  so  in 
this  case  the  probability  of  a  score  of  3  or  less  is  on  the  other  hand,  if  it  is 
known  that  the  score  is  odd  (E),  then  the  probability  of  a  score  of  3  or  less  is  §. 

In  general,  if  the  probability  that  an  event  A  occurs  is  dependent  on  whether  or 
not  another  event  E  has  occurred,  then  we  speak  of  the  conditional  probability 
of  A  given  E,  and  denote  this  by  P(A\E).  This  conditional  probability  is  defined 
by  the  formula 


P(A\E)  = 


P(A  n  E) 

P(E) 


(1.4) 


it  is  defined  only  when  P(E)  ±  0. 


In  the  die  example,  P(A  fl  E)  =  |  since,  for  only  one  of  the  six  possible 
outcomes  (2)  do  both  A  and  E  occur.  Also  P(E)  =  |.  So  by  the  definition  above 


P{A\E)  = 


P(ADE) 

P(E) 


_  6.  _ 

1 

2 


confirming  the  result  obtained  by  counting  outcomes. 


Question  1.3  For  the  events  A  and  B  of  Question  1.2,  find  the  following 
conditional  probabilities. 

(i)  P{A\B)  (ii)  P{A\B)  (iii)  P(A\B)  (iv)  P(B\A)  □ 

Now  try  the  following  question.  The  first  thing  you  will  need  to  do  in  problems 
like  this  is  identify  the  events  (give  them  letters  as  names  if  this  is  not  done  in  the 
question)  and  write  down  all  the  probabilities  given  in  the  question  using  these 
names.  Then  you  can  use  the  rules  to  calculate  the  probabilities  required. 

Question  1.4  On  any  day  the  probability  that  a  man  watches  the  9  p.m.  news 
is  0.45  and  the  probability  that  he  watches  the  10  p.m.  news  is  0.35.  The 
probability  that  he  watches  neither  bulletin  is  0.25.  Let  E  be  the  event  that  he 
watches  the  earlier  bulletin  and  let  L  be  the  event  that  he  watches  the  later 
bulletin. 

(i)  Wiite  down  all  the  probabilities  given  in  the  question  using  the  event  names 
E  and  L  (for  example,  P(E)  =  0.45). 

(ii)  What  is  the  probability  that  he  watches  both  bulletins  on  a  particular  day? 
(Hint:  Find  P(E  U  L)  first.) 

(iii)  If  he  watched  the  news  at  9  p.m.,  what  is  the  probability  that  he  will  watch 
it  at  10  p.m.? 

(iv)  If  he  did  not  watch  the  news  at  9  p.m.,  what  is  the  probability  that  he  will 
not  watch  it  at  10  p.m.  either?  □ 


Notice  that  the  Formula  (1.4),  which  defines  the  conditional  probability  P(A\E ), 
can  be  rearranged  to  give  a  formula  for  the  joint  probability  P{A  n  E)  in  terms  of 
P(E)  and  the  conditional  probability  P(A\E): 


P(AnE)  =  P(A\E)P(E).  (1.5) 

You  will  find  this  form  useful  in  the  next  question. 

Question  1.5  Suppose  that  the  probability  of  rainfall  tomorrow  depends  on 
today’s  weather.  If  it  rains  today,  then  the  probability  that  it  will  rain  tomorrow 
is  0.7;  while  if  today  is  dry,  then  the  probability  that  it  will  rain  tomorrow  is  0.55. 
Suppose  further  that  the  probability  that  it  will  rain  today  is  0.6.  Let  R  be  the 
event  that  it  will  rain  today,  and  let  S  be  the  event  that  it  will  rain  tomorrow. 

(i)  Write  down  all  the  probabilities  given  in  the  question  using  the  event  names 
R  and  S. 

(ii)  Find  the  probability  that  it  will  rain  today  and  tomorrow. 

(iii)  Find  the  probability  that  it  will  rain  neither  today  nor  tomorrow.  □ 

Bayes’  Formula 

Formula  (1.5),  which  follows  from  the  definition  of  conditional  probability,  can  be 
used  to  obtain  an  important  result  known  as  Bayes’  Formula.  For  any  two 
events  A  and  B , 

P(AnB)  =  P(A\B)P(B) 

and 

P{BDA)  =  P(B\A)P{A). 

Since  P(A  n  B)  =  P(B  n  ^4),  it  follows  that 
P{A\B)P(B)  =  P(B\A)P{A). 

Dividing  by  P{B)  leads  to  the  result  in  the  box  below. 


Bayes’  Formula 

P(A\B)  =  P(B^{A\  Provided  P(B)  ±  0.  (1.6) 


Example  1.1 

40%  of  a  newsagent’s  regular  customers  buy  a  morning  newspaper  every  day, 
while  25%  buy  an  evening  newspaper  daily.  Of  those  customers  who  buy  a 
morning  newspaper,  55%  also  buy  an  evening  newspaper.  One  customer,  Tom,  is 
selected  at  random  from  the  regular  customers. 

(i)  If  it  is  known  that  Tom  buys  an  evening  newspaper  daily,  what  is  the 
probability  that  he  also  buys  a  morning  newspaper  every  day? 

(ii)  In  fact,  Tom  does  not  buy  an  evening  newspaper.  What  is  the  probability 
that  he  buys  a  morning  newspaper  every  day?  What  is  the  probability  that 
he  does  not  buy  a  morning  newspaper  every  day? 


Solution 


Let  M  be  the  event  that  Tom  buys  a  morning  newspaper  every  day  and  let  E  be 
the  event  that  he  buys  an  evening  newspaper  every  day.  We  are  given 

P(M)  =  0.4,  P{E)  =  0.25  and  P(E\M)  =  0.55. 

(i)  The  probability  required  is  P(M\E).  Using  Bayes’  Formula, 

P{M\E)=mmM)  =  0^=088 

So  the  probability  that  Tom  also  buys  a  morning  newspaper  is  0.88. 

(ii)  In  this  case,  the  probability  that  Tom  buys  a  morning  newspaper  every  day 
is  P(M\E).  Again,  using  Bayes’  Formula, 

p{M\e)  =  mwiM) 

P(E) 

Since  P(E\M)  =  1  -  P(E\M)  and  P(E)  =  1  -  P(E), 

P(M\E)  =  x  0,4  —  Q  24. 

v  1  '  1-0.25 

So,  given  the  information  that  Tom  does  not  buy  an  evening  newspaper,  the 
probability  that  he  buys  a  morning  newspaper  every  day  is  0.24.  It  follows 
that  the  probability  that  he  does  not  buy  a  morning  newspaper  every  day  is 

P(M\E)  =  1  -  P(M\E )  =  0.76.  □ 

Question  1.6  A  group  of  children  are  given  two  maths  problems  to  solve,  the 
second  of  which  is  more  difficult  than  the  first.  Two-thirds  of  the  children  solve 
the  first  problem  correctly;  and  three-fifths  of  them  solve  the  second  correctly. 
However,  if  a  child  solves  the  first  problem  correctly,  then  he  or  she  has  a 
conditional  probability  of  |  of  also  solving  the  second  problem  correctly.  One 
child,  Anna,  is  selected  at  random  from  the  group.  Denote  by  F  the  event  that 
Anna  solves  the  first  problem  correctly  and  denote  by  S  the  event  that  she  solves 
the  second  problem  correctly. 

(i)  Write  down  all  the  probabilities  given  in  the  question  using  the  event  names 
F  and  S. 

(ii)  If  it  is  known  that  she  solved  the  second  problem  correctly,  what  is  the 
probability  that  she  also  solved  the  first  problem  correctly? 

(iii)  lt  is  known  that  Anna  got  the  first  problem  wrong.  What  is  the  conditional 
probability  that  she  got  the  second  problem  right?  What  is  the  conditional 
probability  that  she  got  the  second  one  wrong?  □ 

Independent  events 

If  the  occui  rence  of  an  event  A  does  not  depend  on  whether  or  not  another  event 
E  has  occurred,  then  P(A\E)  =  P(A).  In  this  case  Formula  (1.4)  can  be 
rearranged  to  give  the  formula 


P(AnE)  =  P(A)P(E)  (1.7) 


and  the  events  A  and  E  are  said  to  be  independent. 

Question  1.7  A  manufactured  article  has  probability  0.1  of  having  a  defect  of  a 
certain  type,  and  independently  has  probability  0.05  of  having  a  defect  of  another 
type.  An  article  is  chosen  at  random.  Let  A  and  B  be  the  events  that  a  defect 
occurs  of  the  first  and  second  type  respectively. 

(i)  Find  the  probability  that  the  article  has  both  types  of  defect. 

(ii)  Find  the  probability  that  the  article  has  only  one  of  the  two  types  of 
defect.  □ 
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1.3  The  Theorem  of  Total  Probability 


A  result  that  will  be  used  on  many  occasions  in  the  course  is  the  Theorem  of 
Total  Probability.  This  theorem  gives  an  expression  for  the  probability  of 
occurrence  of  an  event  A  when  A  occurs  simultaneously  with  one  of  a  set 
EuE2,...,  En  of  mutually  exclusive  and  exhaustive  events. 

The  events  E1,E2, . . .  ,En  are  mutually  exclusive  and  exhaustive  if  only  one 
of  the  events  can  occur  at  a  time  and  if  one  or  other  of  them  must  occur.  That  is, 

Ei  fl  Ej  =  0  for  i  ^  j,  1  <  i,j  <  n, 

and 

#i  U  E2  U  •  •  •  U  En  =  n. 

This  is  illustrated  in  the  Venn  diagram  in  Figure  1.2. 

Since  the  events  Ei,E2, . . .  ,En  are  mutually  exclusive  and  exhaustive,  so  (see 
Figure  1.3) 

A  =  (A  n  El)  U  {A  n  E2)  u  •  •  •  U  (A  n  En ). 

The  events  A  fl  E1}  A  n  E2, ...  ,AD  En  are  mutually  exclusive,  so  (by  Rule  (1.2)) 
P(A)  =P(AnE1)  +  P(AnEa)  +  ---  +  P(A  D  En). 

Finally,  using  Formula  (1.5),  we  obtain 

P{A)  =  P(A\E1)P(E1)  +  P(A\E2)P(E2)  +  •  •  •  +  P(A\En)P(En). 

This  is  the  required  result.  It  is  stated  formally  in  the  box  below. 

The  Theorem  of  Total  Probability 

For  an  event  A, 

n 

P(A)  =  Y.P(A \Ei)P(Ei),  (1.8) 

i=l 

where  Ei,  E2, . . . ,  En  are  mutually  exclusive  and  exhaustive  events. 

A  special  case  of  this  result  is  obtained  by  observing  that  E  and  £  are  mutually 
exclusive  and  exhaustive  events,  so 

P(A)  =  P{A\E)P{E )  +  P(A\E)P(E)  (1.9) 

for  any  two  events  A  and  E. 

Example  1.2 

Whether  or  not  Sam  can  start  his  car  first  time  in  the  morning  depends  on 
whether  or  not  it  was  garaged  the  evening  before.  If  it  was,  his  car  starts  first 
time  with  probability  0.8.  If  it  was  not,  it  starts  first  time  with  probability 
only  0.15.  The  car  is  garaged  with  probability  0.9.  On  what  percentage  of 
mornings  does  Sam  start  his  car  first  time? 


Figure  1.2  A  set  of  four 
mutually  exclusive  and  exhaustive 
events. 


A  nE |  An£4 

Figure  1.3 
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Solution 


To  answer  this  question,  we  first  need  to  label  the  events.  For  example,  let  F  be 
the  event  that  the  car  starts  first  time  and  let  G  be  the  event  that  the  car  was 
garaged  the  previous  evening.  Then  we  have 

P(F|G)  =  0.8,  P(F\G)  =  0.15,  P(G)  =  0.9, 
and  we  require  P(F). 

By  the  Theorem  of  Total  Probability  (in  the  form  (1.9)),  we  have 
P(F)  =  P{F\G)P(G )  +  P{F\G)P(G) 

=  0.8  x  0.9  +  0.15  x  0.1  (using  Rule  (1.1)) 

=  0.735. 

Sam  starts  his  car  first  time  on  73.5%  of  mornings.  □ 

The  following  questions  will  give  you  some  practice  at  applying  the  Theorem  of 
Total  Probability  and  some  of  the  other  results  discussed  in  this  section. 


Question  1.8  A  diagnostic  screening  test  for  a  particular  disease  has  been 
shown  empirically  to  be  fairly  reliable:  a  positive  result  is  recorded  in  96%  of  cases 
where  a  patient  actually  has  the  disease,  and  only  in  3%  of  cases  where  the  patient 
is  healthy.  The  disease  is  known  to  afflict  1  person  in  250.  Let  R  be  the  event  that 
a  positive  result  is  recorded  and  let  D  be  the  event  that  a  patient  has  the  disease. 

(i)  What  proportion  of  patients  screen  positive  on  this  diagnostic  test? 

(n)  What  proportion  of  those  screening  positive  are  actually  suffering  from  the 
disease?  □ 


Question  1.9  A  judicial  court  in  a  certain  country  may  return  any  one  of  the 
three  verdicts  ‘guilty’,  ‘not  guilty’  and  ‘not  proven’.  Of  the  cases  tried  by  this 
court,  70%  of  the  verdicts  were  ‘guilty’,  20%  were  ‘not  guilty’,  and  10%  were  ‘not 
proven’.  Suppose  that  when  the  court’s  verdict  is  ‘guilty’  there  is  a  probability 
of  0.05  that  the  accused  is  actually  innocent  and  that  the  corresponding 
probabilities  when  the  verdicts  are  ‘not  guilty’  and  ‘not  proven’  are  0.95  and  0.25 
respectively.  Let  G,  N,  U  be  the  events  that  the  verdicts  ‘guilty’,  ‘not  guilty’,  ‘not 
proven  are  returned.  Let  /  be  the  event  that  an  accused  person  is  innocent. 

(i)  What  is  the  probability  that  an  accused  person  is  actually  innocent? 

(ii)  What  proportion  of  innocent  persons  will  be  found  ‘guilty’  ?  □ 

Tim  rules  of  probability  introduced  in  this  section  will  be  used  on  many  occasions 
m  this  course.  The  definition  of  conditional  probability,  Bayes’  Formula  and  the 
Theorem  of  Total  Probability  are  particularly  important.  If  you  need  further 
practice  to  familiarize  yourself  with  using  these  results,  then  you  will  find  more 
exercises  in  the  Problem  Booklet  and  in  Unit  16. 


2  Discrete  random  variables 


This  section  contains  a  summary  of  results  and  techniques  concerning  discrete 
random  variables.  In  Subsection  2.1,  a  random  variable  is  defined  and  the 
probability  function  of  a  discrete  random  variable  is  introduced.  The 
independence  of  random  variables  is  also  discussed  briefly.  The  main  properties  of 
six  specific  discrete  distributions  are  summarized  in  Subsection  2.2;  you  will  need 
to  have  Neave  to  hand  as  you  work  through  this  subsection.  Finally,  expectation 
lor  discrete  random  variables  is  discussed  in  Subsection  2.3. 
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2.1  Random  variables 


Chance  is  a  feature  of  many  situations.  For  instance,  if  light  bulbs  are  tested  to 
failure,  then  the  time  until  a  bulb  fails  varies  from  bulb  to  bulb.  So  the  time  T 
until  a  randomly  chosen  bulb  fails  is  a  random  quantity:  it  is  an  example  of  a 
random  variable  or  variate.  Other  examples  include  the  height  If  of  a 
randomly  chosen  woman,  the  number  N  of  children  in  a  randomly  chosen 
household  and  the  number  C  of  calls  to  a  telephone  helpline  in  a  two-hour  period. 

The  set  of  all  the  possible  values  that  a  random  variable  can  take  is  called  its 
range.  When  the  range  contains  only  a  finite  or  countably  infinite  number  of 
values,  the  random  variable  is  said  to  be  discrete.  The  random  variables  TV,  the 
number  of  children  in  a  household,  and  C,  the  number  of  calls  to  a  helpline,  are 
discrete  random  variables.  On  the  other  hand,  each  of  the  other  two  random 
variables  above — T.  the  time  to  failure  of  a  bulb,  and  H.  the  height  of  a 
woman — can  take  any  value  in  an  interval  of  values:  they  are  examples  of 
continuous  random  variables.  Continuous  random  variables  are  discussed  in 
Section  4;  this  section  is  devoted  to  discrete  random  variables. 

To  illustrate  the  main  ideas  concerning  discrete  random  variables,  we  shall 
consider  a  simple  example.  Let  X  be  the  score  obtained  when  a  fair  die  is  rolled; 
then  the  range  of  X  is  {1,  2,  3,  4,  5,  6}.  The  probability  function  of  X  specifies 
the  probability  that  X  takes  each  value  in  the  range;  it  is  denoted  by  px(x).  So 
for  each  x  in  the  range  of  X, 

px(x)  =  P(X  =  x). 

For  the  score  on  a  die, 

Px(x)  =  \,  x  =  l,  2,  ...,6. 

In  this  course,  we  shall  omit  the  subscript  X  unless  there  is  any  ambiguity.  So,  for 
the  die  example,  we  would  write  simply 

P(*)  =  |,  ®=  1,  2,  ...,  6. 

Question  2.1  In  a  game,  a  die  is  rolled.  If  the  die  shows  6,  then  a  player  moves 
three  spaces  forward;  if  it  shows  4  or  5,  then  he  moves  two  spaces  forward;  and  if 
it  shows  1,  2  or  3,  then  he  moves  one  space  backward.  Let  Y  represent  the 
number  of  spaces  moved  forward  when  the  die  is  rolled. 

(i)  What  is  the  range  of  the  random  variable  Y? 

(ii)  Write  down  the  probability  function  of  Y.  □ 

This  example  of  rolling  a  die  illustrates  the  fact  that  there  can  be  several  different 
random  variables  associated  with  an  experiment.  In  a  practical  situation,  the 
choice  of  random  variable  is  governed  by  the  problem  to  be  solved. 

In  Section  1  we  indicated  how  the  development  of  a  mathematical  theory  of 
probability  might  begin.  In  such  a  theory,  a  mathematical  definition  of  a  random 
variable  is  required  in  place  of  the  informal  description  just  given.  Formally,  if 
is  the  event  space  of  an  experiment,  then  a  random  variable  is  defined  to  be  a 
function  with  domain  Q  and  codomain  1R.  The  image  set  of  a  random  variable  X 
is  called  its  range  and  is  denoted  Qx- 


‘Variate’  and  ‘random  variable’  are 
synonyms.  Both  terms  are  used 
frequently  throughout  the  course. 
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We  shall  not  be  concerned  with  the  mathematical  definition  in  this  course 
(although  you  will  find  such  a  definition  in  the  Handbook).  However,  the  notation 
fix  for  the  range  of  a  random  variable  X  is  a  useful  shorthand  and  so  it  is  used 
occasionally,  both  in  the  Handbook  and  in  the  units.  For  instance,  it  enables  us  to 
write  down  the  basic  properties  of  a  probability  function  in  a  concise  form: 

(i)  0  <  px{x)  <  1  for  all  x  E  Qx 5 

(ii)  Px(x)  =  l. 
xefix 

The  first  property  follows  from  the  fact  that,  for  each  x  in  the  range  of  X,  px(x) 
is  a  probability  and  so  lies  between  0  and  1.  The  second  property  follows  because 
X  must  take  some  value  in  its  range,  so  the  sum  of  all  the  probabilities  px(x) 
must  be  equal  to  1. 


Independent  random  variables 

The  idea  of  independence,  which  was  introduced  for  events  in  Subsection  1.2,  can 
be  extended  to  random  variables.  Two  events  A  and  E  are  said  to  be  independent 
if  the  occurrence  of  one  event  does  not  depend  on  whether  or  not  the  other  event 
has  occurred,  and  if  this  is  the  case  then 


P(AnE)  =  P(A)P(E). 

In  a  similar  way,  two  random  variables  X  and  Y  are  said  to  be  independent  if  the 
occurrence  of  any  event  associated  with  X  does  not  depend  on  the  occurrence  of 
any  event  associated  with  Y.  For  discrete  random  variables  this  is  equivalent  to 
the  following  condition: 


P(X  =  x,  Y  =  y)  =  P(X  =  x)P(Y  =  y )  for  all  x  E  f2x,y  £ 

The  joint  probability  P(X  =  x,Y  =  y)  is  denoted  p(x,y),  so  this  condition  may 
be  written 

p{x,  y)  =  px  ( x)pY  (y)  for  all  x  E  ft* ,  y  E  fty  • 


The  comma  in  the  probability 
P{X  =  x,  Y  =  y)  may  be  read  as 
‘and’. 


Example  2.1 

Suppose  that  two  dice  are  thrown.  If  X  is  the  score  on  the  red  die  and  Y  is  the 
score  on  the  blue  die  then,  since  there  are  36  equally  likely  possible  outcomes  for 
the  pair  of  scores, 

P(x’  V)  =  55  for  x  =  1, 2, . . . ,  6,  y  =  1, 2, . . . ,  6. 

But  there  are  six  equally  likely  possible  outcomes  for  the  score  on  a  single  die,  so 
Px{x)  =  %  for  x  =  1,2,  —  , 6 

and 

Pv{y)  =  \  for  y  =  1,2,..., 6. 

Therefore 

p(x,y)  =px{x)pY{y)  for  x=  1,2,..., 6,  y=l,2,...,6. 

The  random  variables  are  independent.  □ 
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Example  2.2 

Now  suppose  that  X  is  the  score  on  a  die  and  that  Y  is  a  random  variable  that 
takes  the  value  1  if  the  score  is  even  and  the  value  0  if  the  score  is  odd.  Then  (for 
example), 

p(2,l)=P(X  =  2,Y  =  l)  =  l 

But 

Px(  2)  =  £  and  py(  1)  =  f, 
so 

px(2)py(l)  =  |  x  |  =  jh. 

Hence, 

p(2,1)^Px(2)py(1) 

and  so,  in  this  case,  the  random  variables  are  not  independent.  □ 


2.2  Specific  probability  distributions 

There  are  several  common  discrete  distributions  which  occur  repeatedly  in 
probability  models  of  real-life  situations.  In  this  subsection  they  are  listed, 
together  with  their  probability  functions,  and  examples  of  situations  to  which 
they  are  appropriate  are  given. 

( 1 )  The  discrete  uniform  distribution 

The  random  variable  X  is  said  to  follow  a  discrete  uniform  distribution,  or  to 
be  uniformly  distributed,  if 

p{x)  =  1/n,  x  =  1,  2 ,...,n. 

The  score  on  the  throw  of  a  die  (n  =  6)  can  be  modelled  by  a  discrete  uniform 
distribution. 

( 2 )  The  Bernoulli  distribution 

A  Bernoulli  trial  is  an  experiment  that  has  precisely  two  outcomes:  either  an 
event  E  occurs  or  it  does  not.  These  outcomes  are  often  described  as  ‘success’  or 
‘failure’,  and  their  probabilities  are  denoted  by  p  and  q  (=  1  —  p)  respectively.  The 
random  variable  X  is  said  to  follow  a  Bernoulli  distribution  with  parameter  p 
if  P(X  =  1)  =  p  and  P(X  =  0)  =  q.  The  probability  function  of  X  can  be  written 

p{x)  =  pxq1~x ,  x  =  0, 1.  a0  =  1  for  all  a  >  0. 

( 3 )  The  binomial  distribution 

Suppose  that  a  sequence  of  n  independent  Bernoulli  trials  is  performed.  At  each 
trial  either  a  success  occurs,  with  probability  p,  or  a  failure  is  recorded,  with 
probability  l  —  p  =  q.  The  total  number  of  successes  in  the  n  trials  can  be 
denoted  by  a  random  variable  Y  with  range  {0, 1, . . .  ,  n}.  The  probability 
function  of  Y  is  given  by 

V{y)  =  (^J  Py q7l~y ,  V  =  0,  1,  ...,n,  (2.1) 

where 

( n\  _  n' 

I  —  —r, - rr-  0!  is  defined  to  be  1. 

\yj  y'-{n-y)\ 


Here,  Y  is  said  to  follow  a  binomial  distribution  with  parameters  (n,p),  and 
this  is  written  Y  ~  5(n,p).  If  X±  is  the  Bernoulli  variate  representing  the  result 
of  the  2th  trial,  then  X{  ~  B(l,p).  Since  X{  =  1  if  and  only  if  the  ith  trial  results 
in  success,  it  follows  that 

Y  =  X  i  +  X2  +  . . .  +  Xn , 

so  the  binomial  variate  Y  is  equal  to  the  sum  of  n  independent  Bernoulli  variates. 

For  small  values  of  n,  binomial  probabilities  can  be  calculated  directly,  as  in  the 
next  example. 


Example  2.3 


Suppose  that  Jim  is  late  for  work  a  third  of  the  time.  Then  T,  the  number  of 
days  he  is  late  in  a  five-day  working  week,  has  a  binomial  distribution: 

Y  ~  5( 5,  g).  The  probability  that  he  is  late  twice  in  a  particular  week  is  given  by 


P(Y  =  2)  = 


=  10  x 


243 


2 

3 

80 

243 


5! 

“  2!3! 
~  0.3292. 


1  8 
-  x  — 
9  27 

□ 


For  n  <  20  and  selected  values  of  p,  binomial  probabilities  are  given  on 
pages  4,6,8, 10  of  Neave.  If  you  wish  to  practise  using  these  tables  then  you  will 
find  exercises  in  the  Problem  Booklet.  However,  sometimes  a  binomial  probability 
is  required  for  a  value  of  p  that  is  not  listed  in  Neave.  When  this  is  the  case,  you 
should  use  Formula  (2.1)  and  your  calculator:  using  the  nearest  value  of  p  listed 
in  the  tables  does  not  in  general  give  a  sufficiently  accurate  answer. 

A  common  source  of  error  when  calculating  binomial  probabilities  is  in  evaluating 

the  binomial  coefficients  (^y^j  ■  Some  calculators  give  binomial  coefficients.  But 

don’t  worry  if  yours  does  not:  binomial  coefficients  for  n  =  1  to  36  and  n  =  52  are 
listed  on  page  44  of  Neave. 


Question  2.2  Use  your  calculator  (and,  if  required,  the  table  of  binomial 
coefficients  on  page  44  of  Neave )  to  calculate  the  following  probabilities. 

(i)  P(Y  =  4),  where  Y  ~  5( 7, 0.3) 

(ii)  P(V  =  3),  where  V  ~  5(10,0.6) 

(iii)  P(W  =  10),  where  W  ~  5(14,0.42) 

(iv)  P(Z  <  2),  where  Z  ~  5(6,  |)  □ 

(4)  The  geometric  distribution 

The  geometric  distribution  arises  from  another  aspect  of  the  same  model  as  the 
binomial  distribution:  that  of  a  sequence  of  Bernoulli  trials.  In  this  course  we 
shall  meet  very  many  situations  where  the  same  physical  process  can  be  studied 
from  different  points  of  view,  giving  rise  to  different  random  variables.  Whereas 
Y  ~  B(n,p)  is  a  random  variable  giving  the  total  number  of  successes  in  n  trials, 
the  number  of  Bernoulli  trials  up  to  and  including  the  first  success  is  a  random 
variable  X  that  follows  a  geometric  distribution  with  parameter  p,  written 
X  ~  Gi(p).  The  range  of  JA  is  {1,  2, . . .},  so  for  this  distribution  the  range  is 
infinite.  The  event  [X  =  x ]  occurs  if  the  first  x  -  1  trials  result  in  failure  and  then 
the  xth  trial  results  in  a  success.  Hence,  the  probability  function  is 

P{x)  =  qx~1p,  x  =  1,2,.... 


Square  brackets  are  sometimes 
used  to  denote  an  event. 
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An  event  associated  with  this  distribution  that  is  sometimes  of  interest  is  [X  >  k]: 
the  event  that  at  least  k  trials  are  required  to  achieve  a  success.  The  probability 
of  this  event  can  be  calculated  directly  by  summation: 

OC 

P(X  >*)  =  £>(*) 

x=k 

=  qk~lP  +  qkp  +  qk+1p  +  ... 

=  q^'pil+q  +  q2  +  ...) 

_  qk~lp 
i  -  q 

=  qk~1-  (2.2) 

Alternatively  this  probability  can  be  calculated  by  noting  that  the  event  [X  >  k] 
is  equivalent  to  the  event  that  the  first  k  -  1  trials  all  result  in  failure.  The 
probability  of  this  latter  event  is  obviously  qk~1. 

Question  2.3  Robert  is  a  keen  archer.  From  a  certain  distance,  the  probability 
that  he  hits  the  bull’s-eye  with  each  shot  is  0.3. 

(i)  Find  the  probability  that  he  requires  exactly  three  shots  to  hit  the  bull’s-eye. 

(ii)  What  is  the  probability  that  he  requires  more  than  five  shots  to  hit  the 
bull’s-eye?  □ 

The  geometric  distribution  can  also  occur  with  range  {0, 1,2,.. .},  which  includes 
the  value  0.  Consider  the  random  variable  Z,  the  number  of  successful  trials 
before  failure  occurs  (but  not  including  the  trial  at  which  the  failure  occurs).  For 
example,  a  trial  could  be  the  use  of  a  piece  of  machinery  and  Z  would  represent 
the  number  of  times  it  is  used  before  breakdown.  Here  the  smallest  value  of  Z  is  0, 
corresponding  to  the  machine  being  broken  down  before  its  first  attempted  use  (in 
this  case,  there  are  no  successes  before  the  first  failure).  The  distribution  of  Z  is 

p(z)=qpz,  z  =  0,1,2,.... 

We  write  Z  ~  G0(p)  and  Ttz  =  {0, 1,2,.. .}. 

Question  2.4  The  probability  that  Sarah  can  go  correctly  through  the 
procedures  at  a  cashpoint  outside  a  bank  is  0.8,  and  the  probabilities  at  different 
visits  are  independent.  If  Sarah  makes  a  mistake,  her  card  is  not  returned  to  her. 
What  is  the  probability  that  she  makes  fewer  than  five  withdrawals  before  losing 
her  card?  □ 

Notice  that  different  abbreviations  are  used  for  the  two  different  versions  of  the 
geometric  distribution:  G0{p)  when  the  range  is  {0,  1,  2,  . . .}  and  Gi(p)  when  the 
range  is  {1,  2,  3,  . . .}.  These  abbreviations  are  useful  as  they  provide  a  clear  and 
concise  way  of  specifying  which  geometric  distribution  is  being  used  in  any  given 
situation. 

(5)  The  negative  binomial  distribution 

The  negative  binomial  distribution  is  a  generalization  of  the  geometric 
distribution.  The  number  X  of  Bernoulli  trials  up  to  and  including  the  kth. 
success  has  a  negative  binomial  distribution  with  parameters  ( k,p ).  The 
number  of  trials  is  at  least  k,  so  for  this  distribution  the  range  is  {k,k  +  1, . . .}. 
The  probability  function  is  derived  by  noting  that  if  the  random  variable  X  takes 
the  value  x,  then  in  the  first  x  -  1  trials,  k  -  1  successes,  and  hence  x-k  failures, 
are  recorded.  Then  a  success  must  be  recorded  at  the  zth  trial.  Thus  the 
probability  of  occurrence  of  the  event  [X  =  x]  is  given  by  the  binomial 
distribution  (2.1)  with  n  =  x  —  1  and  y  =  k  —  1,  multiplied  by  p.  Hence  for  the 
negative  binomial  distribution, 

P(x)  =  x  =  k,k  +  1,.... 


For  |a:|  <  1, 

1  4-  x  +  x2  +  •  •  •  =  — - — . 

1  —  x 

This  result  is  given  in  the 
Handbook. 


This  form  of  the  geometric 
distribution  is  not  discussed  in 
M246. 
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Question  2.5  If  Robert  shoots  arrows  at  the  target  until  he  has  hit  the 
bull’s-eye  three  times  (see  Question  2.3),  find  the  probability  that  he  shoots 
exactly  seven  arrows.  □ 

As  with  the  geometric  distribution,  the  negative  binomial  distribution  sometimes 
occurs  with  range  {0, 1, 2, . . .}.  It  is  then  the  distribution  of  the  number  of 
successes  occurring  before  k  failures  occur  in  a  sequence  of  Bernoulli  trials.  Both 
versions  are  included  in  the  table  of  discrete  probability  distributions  on  page  4  of 
the  Handbook,  so  you  can  look  up  the  formulas  for  the  probability  functions 
whenever  you  need  them. 

( 6 )  The  Poisson  distribution 

The  last  distribution  in  our  armoury  of  common  discrete  distributions  is  the 
Poisson.  The  random  variable  Z  has  a  Poisson  distribution  with  parameter  p  if 

p~^uz 

p(z)  =  2  =  0,1,..., 

Z  • 

and  we  write  Z  ~  Poisson(^).  The  Poisson  distribution  is  closely  linked  with  the 
Poisson  process,  which  we  shall  discuss  in  Unit  2.  It  provides  a  satisfactory  model 
for  such  diverse  situations  as  misprints  on  a  page,  goals  in  a  football  match,  white 
corpuscles  in  a  small  sample  of  blood,  and  eggs  laid  by  a  bird.  Tables  of  Poisson 
probabilities  are  available  in  Neave ,  pages  14  to  16.  However,  if  the  value  of  the 
parameter  p  you  require  is  not  listed,  then  you  will  need  to  use  your  calculator. 

The  next  two  questions  can  be  answered  using  either  the  tables  in  Neave  or  a 
calculator.  Make  sure  that  you  can  use  either  method. 

Question  2.6  The  number  of  misprints  on  a  page  in  a  book  may  be  modelled 
by  a  Poisson  distribution  with  parameter  2.5.  Find  the  probability  that  a  page: 

(i)  contains  no  misprints; 

(ii)  contains  more  than  three  misprints.  □ 

The  Poisson  distribution  is  also  the  limit  of  the  binomial  distribution  as  n  ->■  oo 
and  p  0  in  such  a  way  that  np  remains  fixed  and  equal  to  p.  It  thus  provides  a 
model  for  situations  that  can  be  described  by  a  large  number  of  Bernoulli  trials 
with  a  very  small  probability  of  success.  That  is,  if  p  is  small, 

B(n,p)  «  Poisson(np), 

where  «  is  read  as  ‘has  approximately  the  same  distribution  as’. 

Question  2.7  The  probability  that  a  man’s  sight  is  colour-deficient  is 
about  0.06.  Write  down  the  exact  distribution  of  the  number  of  men  who  have 
colour-deficient  sight  out  of  a  group  of  100.  Find  the  approximate  probability 
that,  in  a  group  of  100  men,  fewer  than  three  have  colour-deficient  sight.  □ 

The  final  question  in  this  subsection  will  give  you  some  practice  at  choosing  an 
appropriate  probability  distribution. 

Question  2.8  The  probability  that  a  randomly  selected  car  is  white  is  0.2. 

State  precisely  the  distribution  of  each  of  the  following  random  variables. 

(i)  X,  where  X  =  1  if  the  next  car  to  pass  me  is  white  and  X  =  0  otherwise. 

(ii)  Y ,  the  number  of  cars  passing  me  up  to  and  including  the  first  white  one. 

(iii)  W ,  the  number  of  cars  that  are  not  white  in  a  car  park  containing  50  cars. 

(iv)  V ,  the  number  of  white  cars  passing  me  before  the  third  non-white  one.  □ 
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2.3  Expectation 


The  mean  or  expected  value  or  expectation  of  a  discrete  random  variable  X , 
denoted  by  p  or  E(X ),  is  given  by  the  formula 


Note  that  we  also  refer  to  E{X) 
the  ‘mean  of  the  distribution  of 


That  is,  the  mean  is  calculated  by  finding  the  sum  of  the  products  xpx(x),  where 
the  sum  is  taken  over  the  range  of  X. 

Example  2.4 

The  expected  value  of  a  Bernoulli  variate  X  ~  B(l,p)  is 
p  =  E(X)  =  0  x  q  +  1  x  p  =  p.  □ 

Question  2.9  Find  the  expected  value  of  a  random  variable  X  having  the 
following  probability  function. 


X 

12  3  4 

p(x) 

0.4  0.3  0.2  0.1 

In  Example  2.4  and  Question  2.9  the  calculation  involved  in  finding  the  mean  was 
straightforward.  However,  this  is  not  always  so.  There  is  an  alternative  formula 
for  the  mean  which  occasionally  simplifies  the  calculation.  The  formula  is  derived 
below,  so  that  you  can  see  how  the  formula  arises.  You  will  not  be  expected  to 
reproduce  the  derivation,  just  to  apply  the  result. 

The  alternative  formula  for  the  mean  contains  the  cumulative  distribution 
function  of  X,  F(x),  which  is  given  by 

F{x)  =  P{X  <  *),  x  G  R. 

The  cumulative  distribution  function  (c.d.f.)  is  defined  for  all  x  e  R.  When  X  is  a 
discrete  variate  it  is  a  step  function  with  jumps  at  each  point  in  the  range  and  no 
others. 

The  alternative  formula  for  the  mean  applies  only  when  the  range  Qx  is  a  subset 
of  {0, 1,  2, . . .},  that  is,  when  the  only  possible  values  x  of  X  are  non-negative 
integers,  so  that  P(X  >  0)  =  1.  In  this  case,  for  each  x  6 

F{x)  =  P(X  <  x) 

=  p(0)+p(l)  -) - +p{x). 

In  order  to  derive  the  alternative  formula  for  the  mean  it  is  convenient  to 
introduce  the  function  Q  defined  by 

Q{x)  =  l-F(x),  xenx. 

Then 

Q(x)  =  1  -  P(X  <  x) 

=  P{X  >  x ) 

oo 

=  ?(*)• 
i=x+l 


p,  =  E(X)  =  ^  xpx(x). 

(2.3) 
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Now 


fi  =  E(X )  =  '^^xp(x),  as  defined  in  Formula  (2.3), 

x=0 

ep(l)  +  2p(2)  +  3p(3)  +  ... 

=  p(l)+p(2)+p(3)  +  ... 

+  p(2)+p(3)  +  ... 

+  P(3)  +  ... 

=  «(0)  +  «(l)  +  Q(2)  +  ... 

oo 

= 

x=0 

oo 

=  E(i-fW). 

x=0 

This  result  is  summarized  in  the  box  below  and  its  use  is  illustrated  in  the 
following  simple  example. 


P  =  E{X)  =  ^(1  -  F{x))  for  C  {0, 1, 2, . . .}.  (2.4) 

a:=0 


Example  2.5 

Values  of  the  c.d.f.  of  the  random  variable  X  of  Question  2.9  are  given  in  the 
table  below. 

x  0  1  2  3  4  5 

p(x)  0  0.4  0.3  0.2  0.1  0  ... 

F(x)  0  0.4  0.7  0.9  1  1  ... 


Using  Formula  (2.4)  gives 

oo 

p  =  E(X)  =  £(1  -  F(x)) 

x=0 

=  (1  -  F(0))  +  (1  -  F(l))  +  (1  -  F(2))  +  •  •  • 

=  1  +  0.6  +  0.3  +  0.1  +  0  +  0  +  ••• 

=  2, 

the  same  as  the  value  obtained  in  Question  2.9  using  Formula  (2.3).  □ 


When  an  explicit  expression  exists  for  the  c.d.f.,  as  is  the  case  for  either  form  of 
the  geometric  distribution,  for  instance,  Formula  (2.4)  provides  an  easy  way  of 
calculating  the  mean  of  a  distribution.  However,  this  happens  only  rarely  for 
discrete  random  variables;  the  continuous  analogue,  which  is  discussed  in 
Section  4,  is  a  much  more  useful  result. 

The  mean  of  a  random  variable  gives  a  measure  of  its  average  value  and  is 
obviously  a  useful  summary  of  its  possible  values.  It  is  also  important  to  have  a 
measure  of  its  spread,  and  this  is  most  conveniently  given  by  the  variance,  which 
is  the  expected  value  of  ( X  —  /i)2,  where  fi  —  E(X). 


If  you  are  interested  in  seeing  how 
to  apply  Formula  (2.4)  to  find  the 
mean  of  a  geometric  distribution, 
or  wish  to  try  this  for  yourself, 
then  refer  to  the  Problem  Booklet 
(Question  1.10). 
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The  mean  of  a  function  g  of  a  random  variable  X  may  be  calculated  by  using  the 
formula 


-%P0]  =  5Z  s(x)px(x). 

(2.5) 

x€Qx 

The  variance  is  therefore  given  by 

E[{x~p)2}  =  ^(x-g2p(x).  (2.6) 

xenx 


The  variance  of  X  is  denoted  by  V  X  \  or  o2 . 

Calculation  of  expectations  is  often  simplified  by  using  the  result 

k 


E 


Y^ai9i(X) 


Li=l 


=  J2aiEi9i(X)\, 


i=  1 


(2.7) 


where  each  a*  (i  =  1, . . . ,  k)  is  a  constant;  this  enables  the  calculation  of  an 
expectation  of  a  complicated  function  of  X  to  be  split  up  into  the  sum  of  simpler 
calculations.  For  the  variance  of  X,  it  leads  to 


V(X)  =  E[(X  -  g)2} 

=  E{X2  -  2gX  +  g2) 

=  E(X2)  —  2 gE(X)  +  g2,  using  Result  (2.7), 

=  E(X2)-g2.  (2.8) 

This  formula  is  often  simpler  to  apply  than  is  Formula  (2.6). 

The  standard  deviation  of  X  is  denoted  by  a  and  is  defined  to  be  the  square 
root  of  the  variance:  a  =  y/V(X). 


The  result 
E{c)  =  c, 

for  any  constant  c,  has  also  been 
used  here. 


Example  2.6 

For  a  Bernoulli  variate  X 

E(X2)  =  X>2p(a:)  =  02  x  p( 0)  +  l2  x  p(l) 

=  02  x  q  +  l2  x  p  =  p, 

and  g  =  E(X)  =  p  (from  Example  2.4).  So.  using  Formula  (2.8), 

V(X)  =  E(X 2)  —  fJ.2  —  p  —  p2  =  p(l  —  p)  =  pq. 

The  standard  deviation  of  X  is  VvW)  =  Vw-  □ 

Question  2.10  Find  the  variance  of  the  random  variable  X  of  Question  2.9.  □ 

Although  means  and  variances  of  most  simple  discrete  distributions  can  be 
calculated  directly  using  the  formulas  above,  the  summations  can  sometimes  be  a 
little  messy.  It  is  often  simpler  to  use  a  function  called  the  probability  generating 
function.  This  is  introduced  in  the  next  section. 

The  mean  and  variance  of  X  +  Y 

To  complete  this  section,  we  state  without  proof  two  results  concerning  the  mean 
and  variance  of  a  sum  of  two  random  variables  X  and  Y. 


For  any  two  random  variables  X  and  Y, 

E{X  +  Y)  =  E{X)  +  E(Y). 

(2.9) 

If  X  and  Y  are  independent  then 

V{X  +  Y)  =  V(X)  +  V(Y). 

(2.10) 
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The  first  result  (2.9)  says  that  the  mean  of  a  sum  of  any  two  random  variables  is 
equal  to  the  sum  of  their  means.  The  second  result  (2.10)  says  that  if,  in  addition, 
the  random  variables  are  independent,  then  the  variance  of  their  sum  is  equal  to 
the  sum  of  their  variances.  We  shall  make  use  of  these  results  on  many  occasions 
during  the  course.  In  the  final  question  in  this  section,  you  are  asked  to  use  them. 

Question  2.11  Find  the  mean  and  variance  of  the  sum  of  two  independent 
random  variables  X  and  Y  if: 

(i)  X  ~  Poisson (yuj,  Y  ~  Poisson(//2); 

(h)  x-GUD.y-Gxd). 

The  means  and  variances  of  standard  discrete  distributions  are  given  in  the  table 
of  discrete  probability  distributions  on  page  4  of  the  Handbook.  You  may  quote 
the  results  given  in  this  table.  □ 


Using  Results  (2.9)  and  (2.10)  we  can  find  the  mean  and  variance  of  the  sum  of 
two  independent  random  variables,  but  this  does  not  tell  us  what  the  distribution 
of  the  sum  is.  In  the  next  section  you  will  see  how  probability  generating 
functions  can  be  used  to  find  this  distribution. 


3  Probability  generating  functions 


In  the  previous  section,  you  saw  that  the  expected  value  of  a  function  g(X)  of  a 
discrete  random  variable  X  with  probability  function  px(x)  is  given  by 
Formula  (2.5): 

E\ff(x)]  =  9{x)Px{x)-  (3.1) 

xESlx 

Finding  the  expectation  of  one  particular  function  g(X)  gives  a  new  function 
called  the  probability  generating  function  (p.g.f.).  This  function  encapsulates  the 
distribution  of  X  in  a  concise  mathematical  form.  In  Subsection  3.1,  the 
probability  generating  function  is  defined,  its  properties  are  discussed  and 
examples  are  given.  Using  a  probability  generating  function  provides  a  way  of 
calculating  the  mean  and  variance  of  a  distribution  without  having  to  sum  series: 
this  method  is  described  in  Subsection  3.2.  Finally,  in  Subsection  3.3,  you  will  see 
how  probability  generating  functions  can  be  used  to  find  the  distribution  of  a  sum 
ol  independent  random  variables.  The  probability  generating  function  is  an 
important  mathematical  tool  in  the  study  of  random  processes.  It  is  used  very 
frequently  from  Unit  4  onwards. 


3.1  The  probability  generating  function 

There  are  two  equivalent  definitions  of  the  probability  generating  function.  The 
fii  st  is  a  neat  definition  mathematically  and  a  convenient  one  for  deriving  some 
results.  However,  it  is  the  equivalent  definition  that  is  useful  in  practice,  so  this  is 
the  one  that  you  need  to  remember.  We  shall  begin  with  the  ‘mathematical’ 
definition,  then  move  quickly  to  the  other.  (Thereafter,  we  shall  use  the  first 
definition  only  where  doing  so  simplifies  greatly  the  derivation  of  a  result— this  is 
the  case,  for  instance,  in  Subsection  3.3.) 


Some  of  these  results  will  be 
derived  in  the  next  section  using 
probability  generating  functions. 


22 


The  probability  generating  function  (p.g.f.)  is  defined  for  any  discrete 
random  variable  X  whose  range  is  a  subset  of  {0, 1,2,.. .}.  It  is  denoted  by  nx(s) 
and  is  defined  to  be  the  expectation  of  the  function  g{X)  =  sx;  that  is, 

nx(s)  =  E(sx).  (3.2) 

The  subscript  X  is  often  omitted  if  this  does  not  lead  to  any  ambiguity. 

The  expectation  E(sx)  is  a  function  of  s.  Note  that  s  is  not  a  random  variable:  it 
is  a  dummy  variable  and  has  no  particular  interpretation.  We  could  just  as  well 
use  another  letter  instead  of  s.  You  will  see  this  explicitly  when  we  come  to  use 
the  second  definition  of  nx  s  .  and  when  we  come  to  use  it  in  practice. 

In  general,  finding  an  expectation  involves  a  loss  of  information:  for  instance, 
knowing  the  mean  of  a  random  variable  tells  you  very  little  about  its 
distribution  many  different  random  variables  can  have  the  same  mean.  However, 
no  information  is  lost  when  a  p.g.f.  is  found:  the  original  distribution  can  be 
reconstructed  from  the  p.g.f.  This  is  what  makes  the  p.g.f.  so  useful.  This 
property  is  evident  from  the  second  definition,  which  we  shall  now  obtain. 

Using  Formula  (3.1),  we  can  write  the  definition  of  the  probability  generating 
function  n(s)  of  X  in  the  form 

oo 

n(s)=  sXv(x)  = 

x£Clx  x=0 

since  fix,  the  range  of  X,  is  a  subset  of  {0, 1,2,.. .}. 


The  probability  generating  function  (p.g.f.)  of  a  discrete  random 
variable  X,  whose  range  is  a  subset  of  {0, 1,2,.. .},  is 

oo 

n(s)  =  ]Lp(x)sX;  (3.3) 

x=0 

that  is, 


n(s)  =  p(0)  +  p(l)s  +  p(2)s2  +  ■■■+  p(x)sx  H - . 


As  you  can  see,  the  p.g.f.  II(s)  is  either  a  polynomial  or  a  power  series  in  s  whose 
coefficients  are  the  probabilities  p{x):  the  coefficient  of  sx  is  p(x)  =  P(X  =  x). 

So,  given  a  probability  function  p(x),  the  p.g.f  is  obtained  by  writing  down  the 
series  in  which  the  constant  term  is  p(0),  the  coefficient  of  s  is  p(l),  the  coefficient 
of  s2  is  p{ 2),  and  so  on. 


Example  3.1 

The  random  variable  X  has  the  following  probability  function. 


X 

1 

2 

p{x) 

0.6 

0.4 

The  p.g.f.  of  X  is 

n(s)  =  0.6s  +  0.4s2.  □ 

Example  3.2 

The  random  variable  Y  has  the  following  probability  function. 


y 

0  12 

p{y) 

0.2  0.6  0.2 

The  p.g.f.  of  Y  is 

n(s)  =0.2  + 0.6s  +  0.2s2.  □ 


It  is  generally  assumed  that  |s|  <  1. 
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Question  3.1  Write  down  the  probability  generating  function  for  each  of  the 
following  random  variables. 

(i)  The  random  variable  X  with  the  following  probability  function. 


X 

1 

2  3 

p(x) 

0.5 

0.4  0.1 

(ii)  The  random  variable  X  where  X  is  the  number  showing  on  a  fair  die. 

(iii)  The  random  variable  X  that  takes  the  value  7  with  probability  1.  □ 

A  random  variable  that  takes  a  single  value  with  probability  1  is  called 
degenerate.  The  random  variable  in  part  (iii)  of  Question  3.1,  which  takes  the 
value  7  with  probability  1,  is  an  example  of  a  degenerate  random  variable.  In 
general,  if  a  random  variable  X  takes  the  value  k  with  probability  1,  then 

p(k)  =  P(X  =  k)  =  1, 
p(x )  =  P(X  =  x)  =  0  for  x  k , 

so  the  p.g.f.  of  X  is 
II(s)  =  sk. 

We  shall  use  this  result  later  in  this  section  and  in  Units  7  and  8. 

Example  3.3 

Given  a  p.g.f.  II(s),  we  can  obtain  the  probability  p(x)  =  P{X  =  x)  for  any 
particular  value  of  x  by  finding  the  coefficient  of  sx  in  II(s).  For  example,  if  X  has 
p.g.f. 

n(s)  =  |(i  +  s)2, 

then  we  can  rewrite  this  as 

n(s)  =  \  +  \s  + 

so 

P(°)  =  p(  1)  =  f ,  p(  2)  =  □ 

Question  3.2  Find  the  probability  functions  for  the  random  variables  with  the 
following  p.g.f.s. 

(i)  II(s)  =  Js(l  +  s  +  2s2) 

(ii)  n(s)  =  i(2  +  s2)2  □ 

The  above  simple  examples  demonstrate  how  to  obtain  the  p.g.f.  given  the 
probability  function  for  a  random  variable  and  how  to  reconstruct  a  probability 
function  from  a  p.g.f.  Before  looking  at  the  p.g.f.s  of  some  standard  distributions, 
we  shall  summarize  briefly  some  of  the  basic  properties  of  p.g.f.s. 

Suppose  that  II(s)  is  the  p.g.f.  of  a  random  variable  X: 

oo 

n(s)  =  ^2p(x)sX  =p{0)+p{l)s+p{2)s2  +  •••  . 

x=0 

First  note  that  putting  s  =  0  gives 

n(0)  =p(0)  +  0  +  0  +  -..=p(0), 

and  so 

n(o)  =P(o)=P(x  =  o). 

Also,  putting  s  =  1  gives 

/  n(i)  =p(o) +p(i) +p(2) +  •••  =  i, 

since  p  is  a  probability  function. 
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It  has  already  been  observed  that  the  coefficient  of  s*  in  the  p.g.f.  is  p(x)  for  each 
x.  Thus  the  probability  function  determines  the  p.g.f.  uniquely  and  vice  versa. 
This  property  is  one  reason  for  the  importance  of  the  p.g.f. 

The  above  properties  are  listed  below  for  convenience. 


Properties  of  the  p.g.f. 

1  The  coefficient  of  sx  in  II(s)  is  p(x): 

oo 

n(5)  =  J2  ?(*)**  =  P(0)  -  p(  *)s  +  P(2)s2  +  ■■■  ■ 

x=0 

2  The  p.g.f.  determines  a  distribution  uniquely  and  vice  versa. 

3  n(o)  =p(o)  =  P{x  =  o) 

4  n(i)  =  i 


Several  more  properties  will  be  derived  in  Subsections  3.2  and  3.3.  But  first  we 
shall  obtain  the  p.g.f.s  of  some  standard  distributions. 


Example  3.4  The  p.g.f.  of  a  geometric  distribution 

If  X  ~  Gi(p),  then 

p(x)  =qx~1p,  x=  1,2,...  . 

The  p.g.f.  of  X  is  given  by 

oo 

n(s)  =  ^2p(x)sx 

x=0 

oc 

=  J2qx~1psx 

X  =  1 

=  p(s  +  qs2  +  q2s3  +  q3s4  +  •  •  •) 

=  Psi  1  +  Qs  +  (qs)2  +  (qs)3  +  •••). 


The  series  in  brackets  is  a  geometric  progression  i: 

(provided  \qs\  <  1).  Hence  the  p.g.f.  of  a  geometri 
with  parameter  p  is 


n(s)  = 


□ 


qs 


powers  of  qs\  its  sum  is  - 

1  -  qs 

distribution  starting  at  1  and 


Recall  that,  for  |a:|  <  1, 

1  +  x  +  x2  +  •  •  •  =  — - — . 

1  —  x 

Under  the  general  assumption  that 
|s|  <  1,  it  follows  that  |gs|  <  1. 


Question  3.3  Find  the  p.g.f.  of  a  G0(p)  distribution.  □ 


Example  3.5  The  p.g.f.  of  a  binomial  distribution 

Suppose  that  X  ~  B(n,p).  Then  the  probability  function  of  X  is 

P (*)  =  *  =  0,1 

So  the  p.g.f.  of  X  is 

oo 

n(S)  = 

x=0 


—  (q  +ps)n,  by  the  Binomial  Theorem. 


A  statement  of  the  Binomial 
Theorem  is  given  as  Result  11.8  on 
page  6  of  the  Handbook. 
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Thus  the  p.g.f.  of  the  binomial  distribution  with  parameters  n,  p  is 

n (s)  =  (q+Psr.  □ 

Question  3.4  Show  that  the  p.g.f.  of  a  Poisson  distribution  with  parameter  p  is 
given  by 

n(s)  =  e-^-s\ 

(Hint:  ee  =  1  +  0  +  +  •  •  •  +  □ 

2!  n\ 

The  p.g.f.s  of  some  common  distributions  are  given  in  Table  3.1.  The  entries  in 
this  table  are  also  given  in  the  table  of  discrete  probability  distributions  on  page  4 
of  the  Handbook. 


Table  3.1  Probability  generating  functions  of  some  standard  distributions 


Distribution 
Bernoulli,  B(l,p) 
Binomial,  B(n,p) 

Poisson(/i) 
Geometric,  G\(p) 

Negative  binomial 
Geometric,  Go(p) 

Negative  binomial 


Probability  Range 

function 


Pxq1~* 

0,1 

0,1,...,  n 

e-V5 

x! 

0,1,... 

qx~xp 

1,2,... 

'*:J)<i-v 

r,r  +  1,.. 

pxq 

0,1,... 

r+x-l\  xr 
r  —  1  P? 

0,1,... 

p.g.f. 
q  +  ps 

(q  +  ps)n 

e-/4i-s) 

ps 


1  —  qs 


The  results  in  this  table  may  be  quoted  whenever  they  are  required.  They  can  be 
used  either  to  write  down  the  p.g.f.  of  a  given  distribution  or  to  identify  a 
distribution.  The  following  examples  illustrate  how  this  is  done. 


Example  3.6 


Write  down  the  p.g.f.  of  a  random  variable  X  if: 

(i)  X~B(6,§);  (ii)  X~G0(§). 

Solution 

(i)  If  X  ~  B(n,p),  then  the  p.g.f.  of  X  is 

n(5)  =  (g  +  ps)n. 

Putting  n  =  6  and  p  =  §,  q  =  f  gives  the  p.g.f.  of  a  B(6,  §)  distribution: 

HW  =  (§  +  §*)6. 

(ii)  If  X  ~  G0(p),  then  the  p.g.f.  of  X  is 


nw  = 


1  —  ps 

Putting  p  =  §,  q  =  |  gives  the  p.g.f.  of  a  G0(|)  distribution: 

3 


n(s)  =  TTb 


5  -  2s 


□ 


Question  3.5  Write  down  the  p.g.f.  of  a  random  variable  X  if: 
(i)  X  ~  Poisson(4);  (ii)  X  ~  Gi(|).  □ 
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Example  3.7 

Identify  the  distribution  of  a  random  variable  X  if  its  p.g.f.  is  given  by: 

<“>DW=(rV 

Solution 


To  identify  each  distribution  we  need  to  rewrite  the  p.g.f.  so  that  its  form  matches 
one  of  those  in  Table  3.1. 


(i)  This  p.g.f.  is  similar  to  that  for  a  Gi(p)  distribution,  but  with  the  number  4 
in  the  denominator  instead  of  1.  Dividing  numerator  and  denominator  by  4 
gives 


n(s)  = 


3s 


4  —  s 


_  4 


4s 


and  this  is  the  p.g.f.  of  a  Gi  §  distribution. 

(ii)  This  p.g.f.  is  similar  to  that  of  a  negative  binomial  distribution  with  range 
{0, 1, . . .}  and  parameters  r,  p  :  ^  ~-'ps  )  '  Rewritin§  the  given  P-g-f.  t0 
match  this  form  gives 


n«  = 


i 


4 -3s 


This  is  the  p.g.f.  of  a  negative  binomial  distribution  (starting  at  0)  with 
parameters  r  =  3,  p  =  |.  □ 


Question  3.6  For  each  of  the  probability  generating  functions  listed,  identify 
the  corresponding  probability  distribution. 

(i)  n(s)  =  i(2  +  s)4  (ii)  n(s)  = 

=  (iv)  n(s)  =  ^-i-^  □ 

In  this  subsection  the  probability  generating  function  has  been  defined  and  its 
basic  properties  described.  In  the  next  subsection  you  will  see  how  it  can  be  used 
to  find  the  mean  and  variance  of  a  distribution. 


3.2  Calculating  means  and  variances 

In  Subsection  2.3,  the  mean  and  variance  of  a  distribution  were  found  using  the 
following  formulas: 

p  =  E(X)  =  ^  xp{x)  (3.4) 

a:6f2x 

=  V(X)  =  £(X2)  -  p  =  Y,  (3-5) 

xEQx 

In  all  but  the  simplest  cases,  this  involves  summing  series,  and  can  be  a  tedious 
exercise.  However,  when  the  range  of  a  random  variable  is  a  subset  of  {0, 1, . . .}, 
the  p.g.f.  exists  and,  as  you  will  now  see,  it  can  be  used  to  derive  formulas  for  the 
mean  and  variance.  Note  that  you  will  not  be  expected  to  reproduce  these 
derivations,  just  to  use  the  formulas. 

From  Formula  (3.3), 

OO 

n(s)  =  ^2  p{x)sx  =  p(0)  +  p(l)s  +  p(2)s2  +  p(3)s3  4 - . 

a;=0 
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Differentiating  this  with  respect  to  s  gives 

oo 

n'(s)  =  xp(x)sX~1  =  p(l)  +  2p(2)s  +  3p(3)s2  +  •  •  •  , 

x—0 

and  putting  s  —  1  we  have 

*  OO 

n'(l)  =  J2XP(X)  =  P(l)  +  2P( 2)  +  3p(3)  +  •  •  •  . 

x—0 

So  we  have  the  result 

/i  =  E(X)  =  n'(l).  (3.6) 

Differentiating  a  second  time  with  respect  to  s  gives 

OO 

n"(s)  =  x  (x  - 
a:=0 

and  putting  s  =  1  we  have 

OO 

H"!1)  =  J2xix~  i)p(x) 

x=0 

oo  oo 

=  ^Vp(z)  -  ^xp(x) 

x=0  x=0 

=  E(X2)  -  E(X) 

=  E(X2)-^. 

Hence 

E(x2)  =  n"(i)  +  ii 

and,  since  a2  =  V(X)  =  E(X2)  —  /i2,  we  have  the  result 

a2  =V{X)  =  Tl"(l)  +  n  -  fJ?.  (3.7) 

The  two  formulas  (3.6)  and  (3.7)  are  important  results  and  are  repeated  in  the 
box  below  for  easy  reference. 


If  X  is  a  random  variable  with  probability  generating  function  n(s),  then 
the  mean  and  variance  of  X  may  be  found  using  the  following  formulas: 


/.  =  £(X)  =  n'(l); 

(3.6) 

<J2  =  V(X)  =  n"(i)  +  p-p2. 

(3.7) 

Formulas  (3.6)  and  (3.7)  provide  a  method  of  calculating  the  mean  and  the 
variance  of  any  random  variable  whose  p.g.f.  exists.  It  is  much  easier  to  use  these 
formulas  to  find  the  means  and  variances  of  many  of  the  common  distributions 
than  it  is  to  use  the  definitions  (3.4)  and  (3.5)  directly.  An  example  is  given  below. 

Example  3.8  The  mean  and  variance  of  a  binomial  distribution 

If  X  ~  B(n,p)  then  the  p.g.f.  of  X  is 
n(s)  =  ( q  +  ps)n . 

Differentiating  with  respect  to  s  gives 
H'(s)  =  np{q+ps)n-\ 
so 

H'(l)  =  np(q  +  p)n-1  =  np(l)n_1  =  np. 
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Differentiating  a  second  time  gives 

n"(s)  =  n(n  -  1  )p2(q  +  ps)n~2, 
so 

n"(l)  =n(n-  l)p2(q+p)n~2  =  n(n  -  l)p2(l)n~2  =n(n-l)p2. 

Now  using  Formulas  (3.6)  and  (3.7  .  we  have 
p  =  n'(l)  =  np 

and 

a2  =  Tl"(l)+p-  p2 

=  n(n  —  1  )p2  +  np  —  n2p2 
=  np  —  np 2 
=  npq.  □ 

Question  3.7  Use  the  probability  generating  function  to  find  the  mean  and 
variance  of  each  of  the  following  distributions. 

(i)  Poisson (/r)  (ii)  Gi(p)  □ 

The  means  and  variances  of  all  the  standard  distributions  are  given  on  page  4  of 
the  Handbook.  You  may  quote  such  results  whenever  they  are  required.  However, 
for  any  non-standard  distribution  you  will  need  to  use  either  the  definitions  of 
Section  2  or  the  method  of  this  subsection.  If  the  p.g.f.  has  a  succinct  form,  such 
as  those  in  Table  3.1,  then  using  Formulas  (3.6)  and  (3.7)  is  usually  the  simpler  of 
the  two  methods. 


3.3  Sums  of  independent  random  variables 

Probability  generating  functions  can  also  be  used  to  find  the  distribution  of  a  sum 
of  independent  discrete  random  variables.  Suppose  that  X  and  Y  are  two  such 
variates,  the  range  of  each  being  a  subset  of  {0, 1, . . .},  and  that  Z  =  X  +  Y.  Here 
is  how  we  would  have  to  proceed  without  p.g.f.s. 

We  could  find  the  probability  function  of  Z  directly  as  follows: 

Pz{ 0)  =  P{Z  =  0)  =  P(X  =  0  and  Y  =  0)  =  p*(0)py(0), 

Pz{  1)  =  P{Z  =  1)  =  P{[X  =  0  and  Y  =  1]  or  [X  =  1  and  Y  =  0]) 

=  Px(0)py(l)  +px(l)py(0), 

and  so  on.  Clearly,  except  for  very  simple  cases,  this  exercise  could  be  quite 
lengthy  and  tedious. 

In  general,  it  is  much  simpler  to  find  the  p.g.f.  of  Z  in  terms  of  the  p.g.f.s  of  X 
and  Y,  and  then  to  identify  the  distribution  of  Z  from  its  p.g.f. 

We  shall  derive  a  general  result  below.  You  can  use  this  important  result  without 
proof.  Let  n* (s),  Hy(s)  and  Iiz{s)  denote  the  p.g.f.s  of  X ,  Y  and  Z.  Then,  by 
Definition  (3.2), 

nx(s)  =£(/),  ny(s)  =  £(sy)  and  Uz{s)  =  E(sz). 

Now 

n  z(s)  =  E(sz)  =  E(sx+y)  =  E{sxsy). 

Now,  it  is  not  difficult  to  show  that,  for  any  functions  g{X)  and  h(Y)  of 
independent  random  variables  X  and  Y, 

E[g(X)h(Y)}  =  E[g{X)}E[h{Y)}. 

So,  in  particular, 

E(sxsy)  =  E{sx)E(sy), 
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and  hence 


TLz(s)  =  E(sx)E(sy )  =  nx(s)ny(s). 
So  we  have  the  following  result. 


If  X  and  Y  are  independent  random  variables,  the  p.g.f.  of  their  sum,  Z  is 
the  product  of  their  p.g.f.s: 

n2(s)  =  IIx(s)ny(s).  /o  c 


Example  3.9 

Suppose  that  X  and  Y  are  independent  and  X  ~  B{m,p),  Y  ~  B(n,p ),  then 
nx(s)  =  (Q  +  Ps)m  and  ITy(s)  =  (q  +  ps)n . 

Therefore  the  p.g.f.  of  Z  =  X  +  Y  is 

nz(s)  =  ( q  +  ps)m(q  +  ps)n  =  (g+ps)m+n, 
and  hence  Z  ~  B(m  +  n,p).  □ 


Question  3.8  If  X  and  Y  are  independent  Poisson  variates  with  parameters  iu 
and  n2,  find  the  p.g.f.  of  Z  =  X  +  Y  and  hence  identify  the  distribution  of  Z.  □ 

Result  (3.8)  can  also  be  used  to  write  a  random  variable  as  a  sum  of  two  simpler 
independent  random  variables,  as  in  the  next  example. 


Example  3.10  Recurrent  events 

In  Unit  13,  a  model  for  events  that  occur  repeatedly  is  described.  The  p.g.f  of 
Wr,  the  waiting  time  until  the  rth  event,  is  found  to  be 

n  M  =  (§a  +  §*2r. 

In  particular,  the  p.g.f.  of  the  waiting  time  W6  until  the  sixth  event  is 

n(s)  =  (§*+,  I*2)6- 

The  p.g.f.  can  be  rewritten  as 

n(s)  =  56(i  +  i5)6, 

which  is  the  product  of  two  p.g.f.s,  s6  and  (±  +  |s)6.  Now  s6  is  the  p.g.f.  of  a 
degenerate  random  variable  X  that  takes  the  value  6  with  probability  1;  and 

(2  +  25)  !s  the  p.g.f.  of  a  binomial  variate  Y  with  parameters  6  Y  ~  B( 6  1 
Therefore  2  v  ’  2 


Unit  13,  Example  1.2. 


Wq  =  6  +  Y,  where  Y  ~  B(6,  \). 

As  you  will  see  m  Unit  13,  being  able  to  express  the  waiting  time  in  this  way  as  a 
sum  of  two  other  random  variables  makes  it  possible  to  calculate  probabilities 
associated  with  the  waiting  time  much  more  easily  than  would  otherwise  be  the 


Result  (3.8)  extends  to  any  fixed  number  of  independent  discrete  random 
variables. 


If  z  -  X,  +  X2  +  ■  ■  •  +  xn ,  where  X, ,  X2 , . . . ,  Xn  are  independent  discrete 
variates  with  p.g.f.s  nXl(s),nXa(s), . . .  ,IIx„(s),  then 

HzM  =nXl(s)  x  IIx2(s)  x  ...  xnx-(i).  (3.9 
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Example  3.11 


If  Xi,X2, ...  ,Xn  are  independent  Bernoulli  variates,  Xi  ~  B(l,p),  i  =  1, 2, . . . ,  n, 
then  the  p.g.f.  of  Z  =  Xx  -f  X2  H - Xn  is 

nz(s)  =  (q  +  Ps){q  +  ps)---(q  +  ps ) 

=  ( q  +  ps)n ■ 

This  is  the  p.g.f.  of  a  binomial  distribution  with  parameters  n,  p.  So  the  sum  of  n 
independent  Bernoulli  variates  with  parameter  p  has  a  binomial  distribution, 
B(n,p).  This  confirms  a  result  given  in  Subsection  2.2.  □ 

Question  3.9  If  X1,X2, . . .  ,Xn  are  independent  G0(p )  variates,  find  the  p.g.f. 
of  Z  =  Xi  +  X2  H - +  Xn  and  hence  identify  the  distribution  of  Z.  □ 

In  this  section,  the  probability  generating  function  of  a  discrete  random  variable 
has  been  defined  and  you  have  seen  how  it  can  be  used  to  find  the  mean  and 
variance  of  a  distribution.  You  have  also  seen  how  p.g.f.s  can  be  used  to  find  the 
distribution  of  a  sum  of  independent  discrete  random  variables. 

The  results  given  at  the  end  of  Section  2  for  the  mean  and  variance  of  two 
independent  random  variables  can  also  be  derived  using  p.g.f.s.  You  may  like  to 
try  this.  (You  will  need  to  differentiate  Formula  (3.8)  and  use  Results  (3.6) 
and  (3.7).)  These  results  generalize  to  sums  of  more  than  two  random  variables  in 
an  obvious  way: 

E(X  i  +  •  •  •  +  Xn)  =  E(X  i)  +  •  •  •  +  E(Xn) 

and,  if  Xi, . . . ,  Xn  are  independent, 

ViX,  +  ...  +  Xn)  =  V(X1)  +  •  •  •  +  V(Xn). 

In  fact,  many  of  the  results  obtained  for  the  expectation  of  functions  of 
random  variables  also  hold  for  continuous  random  variables.  In  general, 
results  will  be  stated  and  used  for  continuous  random  variables  without 
Continuous  random  variables  are  discussed  in  the  next  two  sections. 


discrete 

such 

proof. 


4  Continuous  random  variables 


In  Section  2,  we  revised  random  variables  each  of  whose  range  is  restricted  to 
some  countable,  though  possibly  infinite,  subset  of  the  real  line — in  other  words, 
discrete  random  variables.  In  this  section  we  discuss  the  basic  concepts  associated 
with  continuous  random  variables — that  is,  random  variables  that  can  take  any 
value  in  an  interval  of  the  real  line.  Several  specific  important  continuous 
distributions  are  described  in  Section  5.  Continuous  random  variables  are  usually 
those  that  arise  as  the  result  of  a  measurement,  as  opposed  to  discrete  variates, 
which  are  the  result  of  counting.  In  this  course,  time  will  be  the  most  common 
continuous  variate. 


4.1  Cumulative  distribution  functions  and  probability  density 
functions 

The  concept  of  the  cumulative  distribution  function  was  defined  in  Section  2  for  a 
discrete  variate,  and  the  same  definition  applies  to  a  continuous  random  variable. 
If  X  is  a  continuous  variate,  then  F(x),  its  cumulative  distribution  function 
(c.d.f.),  is  defined  by 

F(x )  =  P(X  <  x),  x  E  (R. 
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If  there  is  any  possibility  of  confusion,  the  random  variable  is  included  as  a 
subscript:  Fx(x). 

Some  properties  of  the  c.d.f.  follow  immediately  from  its  definition.  For  instance, 
for  every  value  of  x,  F(x )  is  a  probability,  so  its  value  lies  between  0  and  1 
(inclusive).  Also  when  xx  <  x2,  P(X  <  xx)  <  P(X  <  x2),  so  F(Xl)  <  F(x2).  So  a 
c.d.f.  is  a  non-decreasing  function  with  values  between  0  and  1.  And  since  a 
random  variable  must  take  some  value  between  — oo  and  oo,  F(x)  — >  1  as 
x  — >  oo  and  F(x)  — >  0  as  x  — »  -oo. 

The  basic  properties  of  the  c.d.f.  can  be  stated  formally  as  follows: 

(i)  0  <  F(x)  <1,  igU; 

(ii)  for  any  x±  and  x2  such  that  xx  <  x2, 

F{xi)  <  F(x2); 

(iii)  lim  F(x)  =  0;  lim  F(x)  =  1. 

£—>•—00  X  — ►  OG 

Example  4.1 

A  random  variable  X  has  c.d.f.  F(x)  given  by: 

{0  x  <  1 

\{x2  -  1)  1  <  x  <  2 

1  x  >  2. 

A  sketch  of  the  c.d.f.  is  shown  in  Figure  4.1. 


Figure  4.1  The  c.d.f.  of  X 


The  c.d.f.  can  be  used  to  calculate  probabilities.  For  example: 
P(X  <  1.5)  =  jF(1.5) 

=  |(l-52  -  1)  ~  0.417; 

P{X  >  1.4)  =  1  -  P(X  <  1.4) 

=  1-F(1.4) 

=  1  -  |(1.42  -  1)  =  0.68; 


P(  1-2  <  X  <  1.5)  -  P(X  <  1.5)  -  P(X  <  1.2) 

=  F(1.5)-F(1.2) 

=  |(l-52  -  1)  -  |(1.22  -  1)  =  0.27.  □ 


Question  4.1  The  time  in  minutes  to  complete  an  oil  change  in  a  car  takes 
between  2  and  10  minutes.  It  may  be  modelled  by  a  random  variable  T  with  the 
following  cumulative  distribution  function: 


0 

t2  -4 
96 

1 


t  <  2 

2  <  t  <  10 
t  >  10. 
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(i)  What  is  the  probability  that  the  oil  change  takes  not  more  than  6  minutes? 

(ii)  Find  c  if  the  probability  that  an  oil  change  is  completed  in  not  more  than  c 
minutes  is  equal  to  0.9. 

(iii)  Give  a  rough  sketch  of  the  c.d.f.  of  T.  □ 


Note  that,  in  the  question  above,  the  c.d.f.  was  defined  for  all  t  £  U.  This  is 
formally  correct,  though  in  this  course,  when  there  is  no  fear  of  confusion,  the 
statement  lF(t)  =  0  for  t  <  a'  will  often  be  omitted.  This  is  most  likely  to  occur 
when  the  random  variable  T  is  time  and  a  =  0.  It  will  usually  be  obvious  from  the 
context  that  T  cannot  be  less  than  zero.  For  clarity,  in  this  section,  the  c.d.f.  is 
defined  for  all  real  values  in  several  of  the  examples. 

The  c.d.f.  in  Question  4.1  is  a  continuous  function  of  t,  and  this  actually  provides 
a  mathematical  definition  of  a  continuous  random  variable:  it  is  a  random 
variable  whose  cumulative  distribution  function  is  continuous. 


In  this  course  it  will  be  assumed  that  F(x),  the  c.d.f.  of  X ,  is  differentiable  for  all 
x  G  K  except  possibly  at  a  finite  number  of  points.  The  derivative  of  F(x )  is 
denoted  by  f(x )  and  is  called  the  probability  density  function  (p.d.f.).  Hence, 
when  F'(x)  exists, 

F\x)  =  f{x) 


and 


f(u)  du. 


For  each  point  where  F(x)  has  a  sudden  change  in  gradient,  so  that  it  is  not 
differentiable,  the  value  of  f(x)  may  be  chosen  at  will:  the  value  of  f(x)  at  such 
points  does  not  affect  any  probability  calculations,  and  convenience  determines 
the  choice.  (See  the  next  example  and  question.) 


The  main  properties  of  the  p.d.f.  are: 
(0  f(x)  >0,  x  G  IR; 


(ii)  P(a  <  X  <  b)  =  [b  f{x)dx] 
J  a 

/oo 

f(x)  dx  =  1. 

•oo 


Property  (i)  says  that  a  p.d.f.  cannot  be  negative.  Property  (ii)  means  that  the 
probability  that  X  takes  a  value  between  a  and  b  is  given  by  the  area  under  the 
graph  of  the  p.d.f.  between  a  and  b  (see  Figure  4.2).  Property  (iii)  says  that  the 
total  area  under  the  graph  of  the  p.d.f.  is  equal  to  1. 


u  is  a  dummy  variable. 


Figure  4.2 
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Example  4.1,  continued 

Differentiating  F(x)  gives: 

{0  x  <  1 
§a?  1  <  x  <  2 

0  x  >  2. 


The  derivative  cannot  be  calculated  at  x  =  1  and  x  =  2.  For  convenience,  we  shall 
choose  to  specify  /  at  x  =  1  and  x  =  2  by  f(x)  =  |x.  So  we  have: 


/O)  = 


1  <  x  <  2 
otherwise. 


In  practice,  we  shall  normally  omit  the  line  ‘0  otherwise’  and  write  this  simply  as 
f(x)  =  |a;,  1  <  x  <  2. 

(The  line  ‘0  otherwise’  is  assumed.) 

A  sketch  of  the  p.d.f.  is  shown  in  Figure  4.3. 


Figure  4.3  The  p.d.f.  of  X 

The  p.d.f.  may  be  used  to  calculate  probabilities  by  finding  appropriate  areas 
under  its  graph.  For  example,  the  probability  P(1.2  <  X  <  1.5)  is  given  by  the 
area  under  the  graph  of  the  p.d.f.  between  x  =  1.2  and  x  =  1.5  (see  Figure  4.4). 


fix)  | 


4 

3 


2 

3 


0 


Figure  4.4 
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So 

,1.5 

P(1.2  <  X  <  1.5)  =  /  f(x)dx 

J  1.2 

,1.5 

=  |x  dx 

J  1.2 

=  [§*2]^ 

=  |(1.52  —  1.22)  =  0.27, 
as  previously  obtained  using  the  c.d.f.  □ 

Question  4.2  The  random  variable  T.  defined  in  Question  4.1  by  its  c.d.f.,  has 
p.d.f.  fit). 

(i)  Calculate  f(t). 

(ii)  Use  the  p.d.f.  to  find  the  probability  that  the  oil  change  takes  more  than  six 
but  not  more  than  seven  minutes. 

(iii)  Sketch  the  p.d.f.  of  T.  □ 


To  find  the  c.d.f.  of  a  random  variable  from  the  p.d.f.  involves  integration.  By 
definition, 

F(x)  =  P{X  <  x ), 

and  this  probability  is  found  by  calculating  the  area  under  the  p.d.f.  to  the  left  of 
x  (see  Figure  4.5).  So 

F{x)  =  f  f{u)du. 

■l  —  oo 


Figure  4.5  The  shaded  area  is  P(X  <  x) 


Example  4.2 

Suppose  the  p.d.f.  of  a  random  variable  X  is  given  by 
f{x)  =  1  <  x  <  2. 

Clearly,  since  X  can  only  take  values  between  1  and  2,  F(x)  =  0  for  x  <  1  and 
F(x)  =  1  for  x  >  2.  For  1  <  x  <  2, 

F(x)  =  [  f{u)  du 


0  du  + 


=o+[m: 

= *(«*  - 1). 


This  is  the  p.d.f.  obtained  in 
Example  4.1;  a  sketch  is  given  in 
Figure  4.3. 
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Thus  we  have: 


F{x)  = 


x  <  1 

1)  1  <  a;  <  2 

x  >  2. 


Of  course,  this  is  the  c.d.f.  that  we  started  with  in  Example  4.1.  □ 


Question  4.3  The  p.d.f.  of  a  random  variable  W  is  given  by 
/(in)  =  2  —  2  <  w  <  4. 

Find  F( in),  the  c.d.f.  of  W.  □ 


4.2  Expectation 


The  expectation  of  a  random  variable  X  with  p.d.f.  f(x)  is  defined  analogously 
to  that  of  a  discrete  variate,  with  an  integral  replacing  the  sum: 


fJ<  =  E(X)  = 

POO 

/  xf(x)  dx. 

J  —  oo 

Also  analogously  to  the  discrete  case,  the  expectation  of  a  function  g(X)  of  a 
random  variable  X  is  given  by 

mx)\  =  1 

oo 

g{x)f(x)  dx. 

oo 

So,  for  example, 

/oo 

x2f{x)  dx. 

-oo 

The  variance  of  X,  V(X),  is  defined  by 

/OO 

(x~n)2f(x)dx. 

-oo 

Also,  as  in  the  discrete  case,  it  turns  out  that 


V(X)  =  E(X2)-^- 


this  formula  is  usually  the  easier  of  the  two  to  use  for  calculations. 


Example  4.3 

The  p.d.f.  of  the  random  variable  X  of  Examples  4.1  and  4.2  is 
f{x)  =  | x,  1  <  x  <  2. 

So  the  mean  or  expected  value  of  X  is 


/j,  =  E(X)  =  /  xf(x)dx 


— oo 
2 


x  |x  dx 


| a:2  dx 


-  2  31  ^  _  16  _  2  _  14 
L  9  ^  J  l  9  9  9 


1.556. 
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Similarly, 


/OO 

x2/(x )  dx 

-OO 


=  j  x2  |  xdx 


i: 

s: 


=  j  |x3  dx 


= 

=  ¥-i  =  2-5- 

So  the  variance  of  A'  is  given  by 
V{X)  =  E(X2)  -  n2 

=  2.5-  (^)2  -  0.080, 
and  the  standard  deviation  of  X  is 
y/V(X)  ~  0.283.  □ 


Question  4.4  Find  the  mean  and  variance  of  the  random  variable  W  whose 
p.d.f.  was  given  in  Question  4.3.  □ 

Question  4.5  Calculate  the  mean  and  variance  of  the  time  to  change  the  oil  in 
a  car,  using  the  distribution  in  Questions  4.1  and  4.2.  □ 


An  alternative  formula  for  the  mean 


There  is  an  alternative  formula  for  the  mean  of  a  continuous  random  variable  X 
that  can  take  only  non-negative  values,  which  is  analogous  to  Formula  (2.4)  for  a 
discrete  distribution.  The  formula  is  similar  to  Formula  (2.4),  but  it  contains  an 
integral  instead  of  a  sum.  It  is  stated  below  without  proof. 


rOO 

E{ X)  =  (1  -  F(x))  dx  when  P(X  >  0)  =  1. 

(4.1) 

Jo 

This  formula  is  particularly  useful  when  finding  the  mean  of  a  variate  whose  p.d.f. 
includes  the  exponential  function,  as  you  will  see  in  Section  5.  It  could  be  applied 
to  distributions  such  as  the  ones  given  in  Example  4.1  and  Question  4.1,  though 
the  integration  would  not  be  made  much  simpler  by  doing  so.  Two  examples  of 
the  use  of  Formula  (4.1)  are  given  below. 


Note  that,  for  a  continuous  random 
variable  X,  the  condition 
P(X  >  0)  =  1  is  equivalent  to  the 
condition  Qx  C  [0,  oo). 


Example  4.4 

The  time  to  failure  (in  years)  of  a  particular  electrical  component  may  be 
modelled  by  a  random  variable  X  with  c.d.f.  given  by 

F{x)  =  ±x2,  0  <  x  <  2. 

(Also  F(x)  =  0  for  x  <  0  and  F(x)  —  1  for  x  >  2.) 

Since  X  cannot  take  negative  values,  the  condition  P( X  >  0)  =  1  is  satisfied  and 
Formula  (4.1)  can  be  used  to  find  the  mean  time  to  failure  as  follows: 

/•OO 

E{X)=  (1  -F(x))dx 

Jo 

=  [  (1  -  \x2)dx  + 

Jo 

=  [x  -  ±x3}20  +  0 

_  4 
3  ‘ 

The  mean  time  to  failure  is  |  years  or  16  months.  □ 


(1  —  1)  dx 
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Example  4.5 

A  ceitain  task  always  takes  at  least  an  hour  to  complete  and  sometimes  takes 
much  longer.  The  time  T  taken  to  complete  the  task  may  be  modelled  by  a 
random  variable  T  with  c.d.f.  given  by 

F(t)  =  1  — I  t>  1. 

(Also  F(t)  =  0  for  t  <  1.) 

Since  T  cannot  take  negative  values,  the  condition  P(T  >  0)  =  1  is  satisfied  and 
Formula  (4.1)  can  be  used  to  find  the  mean  time  taken  to  complete  the  task.  This 
is  given  by 

/*oo 

B(T)  =  J  (1  -  F(t))  dt 

=  fo(i-o)dt  +  fi  (l-(i-i))* 

=  Wi+[“a?]1 

=  1  +  1  =  11 
x  -r  2  1  2  ' 

So  the  mean  time  taken  to  complete  the  task  is  1 A  hours.  □ 

An  important  point  to  note  from  these  examples  is  that  the  range  of  integration  is 
always  from  0  to  oo,  whatever  the  range  of  the  random  variable.  This  is  a  case 
where  you  need  to  remember  that  the  c.d.f.  is  defined  for  all  real  values,  not  just 
for  values  in  the  range  of  the  random  variable.  For  instance,  in  Example  4.4,  we 
used  the  fact  that  F(x)  =  1  for  x  >  2,  and,  in  Example  4.5,  we  used  the 
information  that  F(x)  =  0  for  0  <  x  <  1.  The  next  two  questions  will  give  you 
some  practice  at  using  Formula  (4.1). 


Question  4.6  The  length  of  life  (in  years)  of  a  species  of  bird  may  be  modelled 
by  a  random  variable  T  with  c.d.f. 


F(t)  =  1  - 


1 

(TTTp’ 


t  >  o. 


Use  Formula  (4.1)  to  calculate  the  mean  lifetime  of  birds  of  this  species. 


□ 


Question  4.7  In  Question  4.1,  the  time  in  minutes  taken  to  complete  an  oil 
change  is  modelled  by  a  random  variable  T  with  c.d.f. 

£2  -  4 

F(t)  =  ~ 9g~ ’  2  <£<10. 

(Note  that  F(£)  =  0  for  £  <  2  and  F(t)  =  1  for  £  >  10.)  Use  Formula  (4.1)  to 
calculate  the  mean  time  taken  to  complete  an  oil  change.  □ 


The  results  described  in  Sections  2  and  3  for  the  expectation  of  sums  of  discrete 
random  variables  are  also  valid  for  sums  of  continuous  random  variables.  For 
example,  if  X\, . . . , Xn  are  continuous  random  variables  then 

E(X  l  +  •  •  •  +  Xn)  =  E(X  i)  +  •  •  •  +  E(Xn);  (4.2 

and  if  Xi, . . .  ,Xn  are  independent  then 

V{X1  +  •  •  •  +  Xn)  =  V{Xx)  +  •  ■  •  +  V(Xn).  (4.3 

(The  continuous  random  variables  Xi,...,Xn  are  independent  if  the  occurrence 
of  any  event  associated  with  any  one  of  them  is  independent  of  the  occurrence  of 
any  event  associated  with  any  of  the  others.) 
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As  was  stated  at  the  end  of  Section  3.  many  of  the  results  obtained  for  the 
expectation  of  functions  of  discrete  random  variables  also  hold  for  continuous 
random  variables.  Such  results  will  be  used,  when  required,  without 
proof — beginning  in  the  next  section. 


5  Specific  continuous  distributions 


In  this  section,  the  p.d.f.s  of  a  niun:-:  :  :  mm  ::  ntinuous  distributions  are 
given  and  some  of  their  important  properties  are  described.  You  will  have  the 
opportunity  to  apply  some  of  the  techniques  and  results  discussed  in  Section  4. 


5.1  The  uniform  distribution 


m  a 


The  random  variable  X  with  p.d.f. 

_j _ 

s  \  - -  a  <  x  <  b 

b  -  a 

1 

1 

f(x)  =  <  b-  a  ~  ~ 

1 

[  0  otherwise 

1 

is  said  to  be  uniformly  distributed  on  the  interval  [a,  6],  and  this  is  written 

0 

1 

a 

X  ~  t/(a,  b).  A  sketch  of  the  p.d.f.  is  shown  in  Figure  5.1.  By  symmetry,  the 
mean  of  X  is  \(a  +  b). 


Figure  5.1  The  p.d.f.  of 
X  ~  U(a,  b) 


Question  5.1  Calculate  the  variance  of  the  uniform  distribution  U(a,b).  □ 


5.2  The  exponential  distribution 

The  random  variable  X  with  p.d.f. 
f(x)  =  \e~Xx,  x  >  0, 


is  said  to  have  an  exponential  distribution  with  parameter  A,  and  this  is 
written  X  ~  M{\).  A  sketch  of  the  p.d.f.  is  shown  in  Figure  5.2. 


Figure  5.2  The  p.d.f.  of 
X  ~  M( A) 


Question  5.2 

(i)  Write  down  the  c.d.f.  of  the  exponential  distribution  with  parameter  A. 

(ii)  Calculate  its  mean,  using  Formula  (4.1). 

(iii)  Calculate  its  variance. 

(Hint:  You  will  need  to  use  integration  by  parts.)  □ 


The  exponential  distribution  will  occur  very  frequently  throughout  this  course;  it 
is  by  far  the  most  common  continuous  distribution  that  will  be  used  for  modelling 
random  processes.  Usually,  the  random  variable  will  be  the  time  between  two 
events. 

The  memoryless  property 

One  of  the  most  remarkable  properties  of  the  exponential  distribution  is  the 
‘memoryless  property’.  Suppose,  for  example,  that  T,  the  duration  of  the  time 
interval  between  cars  passing  a  certain  point,  has  an  exponential  distribution; 
that  is,  T  ~  M( A).  Then,  if  an  observer  arrives  at  the  kerb  at  any  time,  the  time 
until  the  next  car  passes  is  also  exponentially  distributed  with  parameter  A.  It  is 
irrelevant  how  long  an  interval  there  has  been  with  no  car  before  the  observer 
arrives. 
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This  property  can  be  proved  as  follows.  The  relevant  variables  are  illustrated  in 
Figure  5.3.  Suppose  one  car  passes  at  time  t  =  0,  and  that  by  time  c,  when  the 
observer  arrives,  no  further  car  has  passed.  This  implies  that  the  time  interval  to 
the  next  car  has  duration  greater  than  c.  Now  let  X  be  the  time  that  the  observer 
has  to  wait  before  a  car  passes  him:  thus  T  =  c  +  X. 


Car  passes  npvt  rar  na«ci»c 


observer  arrives 


Figure  5.3 

The  probability  that  the  observer  has  to  wait  longer  than  a  time  x  is  given  by 
P(X  >  x)  =  P(T  >  x  +  c\T  >  c) 

_  P([T  >  X  +  c]  D  [T  >  c]) 

P(T  >  c) 

by  the  definition  of  conditional  probability, 

_  P[T  >  x  +  c) 

"  P(T  >  c ) 

_  1  -P(T<x  +  c) 

1  ~P(T<  c ) 

e— A(x+c) 

=  g-Ac  >  since  1  "  =  e"At, 


This  is  independent  of  c.  So  the  distribution  of  X,  the  time  to  the  next  event 
after  time  c,  has  c.d.f. 

F(x)  =  P{X  <x)  =  l-  e~Xx 

and  hence  is  distributed  as  M{\). 

It  can  be  shown  (but  not  in  this  course)  that  not  only  does  the  exponential 
distribution  possess  the  memoryless  property,  but  it  is  the  only  continuous 
distribution  possessing  this  property. 

The  minimum  of  independent  exponential  variates 

Another  useful  result  concerning  exponential  variates  is  the  distribution  of  the 

minimum  of  several  independent  exponential  variates.  Suppose  that  Tt, 

i  =  1, 2, . . . ,  n,  are  n  independent  exponential  variates  with  parameters  Az  ,  and 

that  T  is  the  minimum  of  Tj ,T2, . . .  ,Tn.  Since  T  is  the  minimum  of 

Ti,T2,  . . .  ,Tn,  it  follows  that  T  >  t  if  and  only  if  Tx  >  t,T2  >  t, . . .  ,Tn  >  t.  Hence 

[T  >  t]  is  the  multiple  event 

Pi  >  t]  n  [t2  >  t]  n  •  •  ■  n  [Tn  >  t]. 

Since  the  random  variables  TuT2,...,Tn  are  independent,  the  occurrence  of  any 
event  associated  with  any  one  of  them  is  independent  of  the  occurrence  of  any 
event  associated  with  any  of  the  others.  It  follows  that 

P(T  >t)  =  P{T j  >  t)P(T2  >£)-..  P(Tn  >  t ) 

=  e~Xlte~X2t  e~Xnt 

since  each  Tj  has  an  exponential  distribution.  Hence 
P(T  >t)  =  e-(*i+A2+-+A „)t 

and  so 

FT(t)  =  1-  e-(Al+A2+-+A")*; 
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that  is,  T  ~  M(Ai  +  A2  +  •  •  •  +  An):  the  minimum  time  has  an  exponential 

n 

distribution  with  parameter  ^  A*. 

i=l 

To  answer  the  next  question  you  will  need  to  use  this  result  and  the  memoryless 
property. 

Question  5.3  Suppose  you  arrive  to  find  three  telephone  kiosks  all  occupied, 
but  no  one  waiting.  The  occupants  are  a  man,  a  woman  and  a  teenager.  You 
know  from  past  experience  that  the  length  of  a  call  has  an  exponential 
distribution;  men’s  calls  have  a  mean  of  2  minutes,  women’s  and  teenagers’  calls 
have  means  of  5  minutes  and  10  minutes,  respectively.  What  is  the  distribution  of 
the  time  you  have  to  wait?  W  hat  is  the  probability  that  you  have  to  wait  less 
than  2  minutes  for  a  vacant  kiosk?  □ 


5.3  The  gamma  distribution 


If  Xi,  X2, . . . ,  Xn  are  independent  identically  distributed  exponential  variates, 
each  with  parameter  A,  then  their  sum 


y  —  Xi  +  X2  +  . . .  +  Xn 

has  a  gamma  distribution  with  parameters  (n,  A);  this  is  written  Y  ~  T(n  A). 
The  p.d.f.  of  Y  is 


f(y)  = 


yn-l\ne- xv 
0 n  ~  1)! 


y  >0. 


Figure  5.4  shows  the  shape  of  the  p.d.f.  for  A  =  1  and  for  several  values  of  n. 


Figure  5.4  The  p.d.f.  of  the  gamma  distribution  T(n,  1) 
for  several  values  of  n 

If  the  duration  of  the  interval  between  successive  events  has  an  exponential 
distribution,  then  that  of  the  interval  between  one  event  and  the  nth  event  later 
has  a  gamma  distribution. 

The  mean  and  variance  of  an  exponential  variate  are  1/A  and  1/A2  respectively 
(as  shown  in  Question  5.2).  Since  a  gamma  variate  is  the  sum  of  n  independent 
exponential  variates,  it  has  mean  n/A  and  variance  nf  A2  (using  Results  (4.2) 
and  (4.3)). 

The  expression  for  the  c.d.f.  of  a  gamma  distribution  can  be  obtained  by  repeated 
integration,  but  it  has  no  simple  form. 


Question  5.4  Suppose  that  you  arrive  at  a  telephone  kiosk  to  find  it  occupied 
and  two  people  already  waiting  to  use  it.  You  know  from  past  experience  that  the 
length  of  a  call  has  an  exponential  distribution  with  mean  4  minutes.  Write  down 
the  distribution  of  the  time  you  will  have  to  wait  before  you  can  use  the  kiosk  and 
hence  find  the  mean  and  standard  deviation  of  your  waiting  time.  □ 


5.4  The  normal  distribution 


The  continuous  random  variable  X  with  p.d.f. 

/(x)  =  ^kexp(_I^)-  X6R’ 

is  said  to  have  a  normal  distribution  with  mean  fi  and  variance  cr2;  this  is 
written  X  ~  N{fi,a2).  A  sketch  of  this  p.d.f.  is  shown  in  Figure  5.5.  The  normal 
distribution  is  very  useful,  especially  in  statistics,  as  it  provides  a  good 
approximate  model  for  many  practical  situations,  in  particular  experimental 
measurements  and  many  biological  quantities  such  as  heights  and  crop  yields. 

If  X  ~  7V(/i,  cr2),  then  the  random  variable  Z  =  (X  -  /z)/ cr  is  also  normally 
distributed  and  has  mean  0  and  variance  1.  This  distribution  is  called  the 
standard  normal  distribution,  written  JV(0,1).  Its  p.d.f.  is  traditionally 
denoted  by  cf)(z ),  where 

4>(z)  =  — ^=e-2'/2,  z  eU. 

\/2  7T 

This  function  is  tabulated  on  pages  18  and  19  of  Neave,  together  with  the 
corresponding  c.d.f.  $(z): 


Since  Z  =  (X  -  fi)/a,  probabilities  of  events  connected  with  the  variate 
X  ~  N(n,a2)  can  be  obtained  from  tables  of  $(z),  where  Z  ~  JV(0, 1).  You  will 
need  to  be  able  to  use  the  tables  in  Neave  to  find  probabilities  connected  with 
normally  distributed  random  variables  on  several  occasions  during  the  course 
(notably  in  Units  5  and  1 4).  The  use  of  these  tables  is  illustrated  in  the  next 
example. 


Figure  5.5  The  p.d.f.  of 
X~  AT(/i,a2) 


Example  5.1 


The  weights  of  packets  of  crisps  labelled  as  containing  100  grams  may  be  modelled 
by  a  normal  distribution  with  mean  102  grams  and  standard  deviation  1.6  grams. 
If  W  is  the  weight  of  a  randomly  selected  packet  then  W  ~  7V(102,  1.62). 


The  probability  that  a  randomly  selected  packet  weighs  less  than  100  grams  is 


P(W  <  100)  =  P 


100-  102  \ 
1.6  ) 


=  $(—1.25) 
=  0.1056, 


using  the  table  on  page  18  of  Neave. 

If  99%  of  packets  weigh  less  than  w  grams,  then 
P(W  <w)  =  0.99, 
or,  equivalently, 


P 


w-  102\ 
1-6  ) 


0.99. 


So 


w  -  102 
1.6 


2.3263,  from  Neave  page  20. 
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Hence 


w  =  102  +  1.6  x  2.3263  ~  105.7  grams, 
so  99%  of  packets  weigh  less  than  approximately  105.7  grams.  □ 

Question  5.5  Suppose  X  ~  77(16,5). 

(i)  What  is  the  probability  that  X  lies  between  15  and  18? 

(ii)  Determine  c  such  that  P(X  >  c)  =  0.1.  □ 

5.5  The  x2  distribution 

The  final  distribution  introduced  in  this  section  is  the  y2  distribution;  its 
definition  is  included  here  for  completeness.  There  is  no  need  for  you  to  do  any 
exercises  on  the  *2  distribution  at  this  stage:  just  read  this  brief  subsection 
through  quickly  so  that  you  know  what  it  contains. 

The  distribution  of  the  square  of  a  standard  normal  variate  is  known  as  the 
chi-squared  distribution  with  one  degree  of  freedom  and  is  denoted  x2(l)- 
In  general,  if  Zi,  Z2, . . . ,  Zn  are  independent  standard  normal  variates,  then 
Wn  =  Z\  -f-  Z\  +  •  ■  •  +  Z2  has  a  chi-squared  distribution  with  n  degrees  of 
freedom,  denoted  x2{n).  The  p.d.f.  of  a  y2(n)  variate  has  quite  a  complicated 
form  and  is  not  needed  for  this  course.  However,  you  can  get  an  idea  of  how  the 
shape  of  the  p.d.f.  changes  as  n  increases  from  the  sketch  in  Figure  5.6,  which 
shows  the  p.d.f.  for  four  values  of  n. 


Selected  values  of  the  chi-squared  distribution  for  different  values  of  n  are  given 
on  page  21  of  Neave.  You  will  need  to  use  these  tables  in  Unit  3.  If  you  are 
unsure  how  to  use  them  then,  when  you  study  Unit  5,  try  the  practice  exercises 
(Question  1.19)  in  the  Problem  Booklet. 

The  main  properties  of  all  the  distributions  described  in  this  section  are 
summarized  in  the  table  of  continuous  probability  distributions  given  on  page  4  of 
the  Handbook.  As  was  the  case  for  discrete  distributions,  you  can  quote  any  such 
results  whenever  they  are  required. 
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6  Simulation 


As  mentioned  in  the  introduction  to  this  unit,  a  major  feature  of  this  course  is  the 
development  of  models  for  random  phenomena.  A  useful  method  of  investigating 
'Such  models  is  simulation.  In  Subsection  6.1,  simulation  for  continuous 
distributions  is  discussed  and,  in  Subsection  6.2,  simulation  for  discrete 
distributions  is  described. 


6.1  Simulation  for  continuous  distributions 

We  begin  with  a  theorem.  This  theorem  is  important  because  it  leads  directly  to 
a  straightforward  method  for  simulating  observations  on  any  continuous  random 
variable.  Before  explaining  this  method  and  giving  some  examples,  we  shall  prove 
the  theorem.  You  will  not  be  expected  to  reproduce  the  proof:  it  is  included  for 
interest  and  completeness.  In  practice,  you  will  be  expected  to  use  the  result,  not 
prove  it. 


The  Probability-integral  Transformation 

If  X  is  a  continuous  random  variable  with  c.d.f.  F(x ),  then  the  variate 
U  —  F(X)  is  uniformly  distributed  on  [0, 1]. 


This  theorem  holds  even  when 
F(x)  is  constant  over  an  interval  of 
values  of  x.  The  proof  requires  a 
slight  modification  in  this  case. 


By  differentiation,  F[j(u)  =  fu(u)  =  1  for  0  <  u  <  1,  and  so  U  ~  U( 0, 1).  □ 

This  result  is  used  to  simulate  observations  of  a  random  variable  X  with  c.d.f. 
F(x)  by  solving  the  equation  F(x)  =  u  for  x,  where  u  is  an  observation  from 
U{ 0, 1).  This  procedure  is  illustrated  in  Figure  6.1. 


lo  prove  this,  we  first  note  that  when  F(x)  is  a  continuous  increasing  function,  it 
is  one-to-one  and  its  inverse  function  F~1(x)  exists.  Also,  since  U  =  F(X)  and  F 
is  the  c.d.f.  of  X,  we  have  0  <  U  <  1.  Now  the  c.d.f.  of  U  is 

Fv{u)  =  P{U  <  u) 

=  P(F(X)  <  u) 

=  P(X<F~1(u )),  since  F(x)  is  an  increasing  function, 

=  F(F~\u)) 

=  u,  0  <  u  <  1. 


Figure  6.1  Simulation  of  a  continuous  random  variable 
with  c.d.f.  F(x)  using  an  observation  from  17(0, 1) 
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Example  6.1 

The  random  variable  X  of  Example  4.1  has  c.d.f.  given  by: 
0  x  <  1 

F(x)  =  <|  |(x2  —  1)  1  <  x  <  2 


1 


x  >  2 


Given  a  random  observation  u  =  0.8165  from  U( 0, 1),  we  can  simulate  an 
observation  from  the  distribution  of  X  by  solving  the  equation  F(x)  =  u.  That  is, 

\(x2  -  1)  =  0.8165, 

which  gives 

x  ~  1.857  (taking  the  positive  square  root  since  x  >  0). 

The  simulated  observation  is  1.857.  □ 

Question  6.1  Find  the  c.d.f.  of  the  random  variable  X  with  p.d.f. 
f{x)  =  4xe~2x2 ,  x  >  0, 

and  use  the  c.d.f.  to  simulate  an  observation  from  the  distribution  using  the 
random  observation  0.4287  from  U( 0, 1).  □ 

A  table  of  random  digits  is  given  on  page  42  of  Neave.  Each  digit  in  the  table  was 
generated  by  a  process  which  was  equally  likely  to  give  any  one  of  the  ten  digits 
0, 1, . . . ,  9.  Random  numbers  from  1/(0, 1)  can  be  formed  from  the  digits  in  this 
table  by  taking  groups  of  digits  and  placing  a  decimal  point  in  front.  For  example, 
using  groups  of  five  digits,  the  top  row  would  be  read  as  0.02484,  0.88139,  etc. 
Using  the  digits  in  this  way,  the  table  may  be  used  together  with  the 
Probability-integral  Transformation  to  simulate  observations  from  any  continuous 
probability  distribution. 

Question  6.2  The  lengths  of  calls  from  a  telephone  kiosk  are  exponentially 
distributed  with  mean  5  minutes.  Use  the  first  four  groups  of  digits  from  the  third 
row  of  the  table  on  page  42  of  Neave  to  simulate  the  lengths  of  four  telephone 
calls.  □ 

In  Question  6.2,  you  were  asked  to  simulate  observations  from  an  exponential 
distribution  with  mean  5  minutes  (parameter  A  =  ±)  by  using  observations  from 
1/(0, 1).  However,  a  table  of  random  numbers  from  M(  1)  is  available  in  Neave  on 
page  43.  Simulated  observations  from  M( A)  are  obtained  by  dividing  these 
numbers  by  A  (or  equivalently  by  multiplying  by  the  mean  i). 

Question  6.3  Use  the  table  of  random  numbers  from  exponential  distributions 
on  page  43  of  Neave  to  simulate  the  lengths  of  four  telephone  calls  from  the 
telephone  kiosk  of  Question  6.2.  Use  the  first  four  numbers  in  the  top  row  of  the 
table.  □ 

The  Probability-integral  Transformation  could  also  be  used  in  conjunction  with 
the  tables  on  pages  18  and  19  of  Neave  to  simulate  observations  from  a  normal 
distribution.  However,  it  is  much  simpler  to  use  the  table  of  random  numbers 
from  normal  distributions  given  on  page  43  of  Neave.  The  numbers  in  this  table 
are  observations  from  a  standard  normal  distribution,  iV(0, 1).  Observations  from 
a  N(n,a2)  distribution  may  be  simulated  by  transforming  the  numbers  in  this 
table  in  the  following  way.  Given  a  random  number  z  from  N( 0, 1),  the  number 

x  =  az  +  fj 

is  a  random  observation  from  N(/j,,a2). 


A  random  observation  from  [7(0, 1) 
is  sometimes  referred  to  as  a 
‘uniform  random  number’. 
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Example  6.2 

An  animal  forages  for  food  along  a  hedgerow  in  such  a  way  that  its  distance  (in 
metres)  away  from  its  burrow  an  hour  after  leaving  it  may  be  modelled  by  a 
random  variable  X  with  normal  distribution  X  ~  iV(20.  100). 

’The  random  numbers  from  normal  distributions  on  page  43  of  Neave  can  be  used 
to  simulate  its  position  an  hour  into  a  foraging  expedition.  For  instance,  using  the 
numbers  in  the  top  row  of  the  table,  its  distance  from  the  burrow  on  each  of  three 
occasions  may  be  found  as  follows. 

Since  /r  =  20  and  a  =  \/l00  =  10,  for  each  number  2  from  the  table,  we  must 
calculate  x  =  IO2  +  20.  So  the  three  simulated  distances  are: 

X\  =  10  x  0.5117  +  20  ~  25.1  metres; 

X2  —  10  x  —0.6501  +  20  ~  13.5  metres; 
x3  =  10  x  -0.0240  +  20  ~  19.8  metres.  □ 

Question  6.4  Use  the  first  three  numbers  in  the  tenth  row  of  the  table  of 
random  numbers  from  normal  distributions  on  page  43  of  Neave  to  simulate  three 
observations  from  a  N( 4, 9)  distribution.  □ 


6.2  Simulation  for  discrete  distributions 

The  main  feature  of  any  simulation  scheme  for  a  discrete  distribution  is  that 
digits  (or  groups  of  digits)  must  be  allocated  to  outcomes  in  such  a  way  that  the 
probability  of  any  particular  outcome  occurring  in  the  simulation  is  equal  to  the 
probability  of  that  outcome  in  the  distribution.  For  some  discrete  distributions,  it 
is  possible  to  construct  a  simulation  scheme  using  single  digits  or  pair  of  digits 
instead  of  complete  groups  of  digits  from  the  table  on  page  42  of  Neave.  Very 
frequently,  such  a  scheme  will  be  simpler  than  one  using  more  digits.  We  begin 
with  some  examples  of  schemes  of  this  type.  Then  a  general  method  is  described 
which  uses  groups  of  digits;  this  method  is  analogous  to  that  used  in 
Subsection  6.1  for  continuous  distributions. 

Example  6.3 

Tosses  of  a  fair  coin  may  be  simulated  using  single  random  digits  by  allocating 
digits  to  outcomes  according  to  the  following  scheme. 


Digit 

Outcome 

0,  1,  2,  3,  4 

Head 

5,  6,  7,  8,  9 

Tail 

In  this  scheme,  the  same  number  of  digits  is  allocated  to  a  head  (h)  as  to  a  tail 
(t),  so  the  two  outcomes  are  equally  likely  to  occur.  Using  digits  from  the  bottom 
row  of  Neave' s  table  to  simulate  the  outcomes  of  ten  tosses  of  the  coin  gives  the 
following  results. 

Digit  1960277575 
Outcome  htthhttttt 


The  simulation  has  resulted  in  three  heads  and  seven  tails  in  ten  tosses  of  the 
coin.  □ 

Question  6.5  Describe  a  scheme  using  single  random  digits  for  simulating  the 
outcomes  of  a  series  of  games  between  two  children,  Alan  and  Barbara,  in  which 
Alan  has  a  probability  of  §  of  winning  each  game.  Use  the  fifth  row  of  digits  on 
page  42  of  Neave  to  simulate  the  outcomes  of  eight  games  between  the  two 
children.  □ 
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Question  6.6  Describe  a  scheme  using  pairs  of  random  digits  for  simulating  the 
outcomes  of  the  series  of  games  in  Question  6.5  if  the  probability  that  Alan  wins 
each  game  is  0.65.  (You  do  not  need  to  carry  out  a  simulation.)  □ 

In  Example  6.3,  the  outcomes  of  ten  tosses  of  a  fair  coin  were  simulated.  In  doing 
so,  the  number  of  heads  obtained  in  a  sequence  of  ten  tosses  was  also  simulated 
(three  in  that  simulation);  that  is,  an  observation  of  a  binomial  random  variable, 
-B(10,  2),  was  simulated.  Similarly,  in  Question  6.5  an  observation  of  a  binomial 
random  variable,  5( 8,  |),  was  simulated.  A  similar  method  could  be  used  to 
simulate  observations  from  any  binomial  distribution,  but  in  practice  it  could  turn 
out  to  be  a  tedious  exercise.  Consider,  for  example,  what  would  be  involved  in 
simulating  six  observations  of  a  binomial  random  variable  5(50,0.37).  For  each 
observation  we  would  need  to  simulate  fifty  observations  of  a  Bernoulli  variate 
5(1,0.37)  and  count  the  number  of  successes.  For  six  observations,  we  would  have 
to  repeat  this  process  six  times.  Clearly,  another  method  is  required. 

A  general  method  for  simulating  observations  from  a  discrete  distribution,  which 
is  analogous  to  that  used  for  continuous  distributions  in  Subsection  6.1,  is 
illustrated  in  Figure  6.2  below.  It  makes  use  of  the  c.d.f.  of  the  distribution. 


F(x)  A 


Figure  6.2  Simulation  of  a  discrete  random  variable  X 
using  a  simulated  value  u  from  5(0, 1) 

Note  that,  if  X  is  discrete,  its  c.d.f.  is  a  step  function.  In  Figure  6.2,  a  horizontal 
line  drawn  at  height  u  meets  the  c.d.f.  at  x  =  3,  so  the  simulated  value  of  X  is 
x  =  3.  In  fact,  for  any  value  u  such  that  5(2)  <  u  <  5(3),  the  value  u  will  give 
rise  to  a  simulated  value  x  =  3. 

In  general,  given  an  observed  value  u  from  5(0, 1),  the  simulation  procedure 
involves  choosing  x  to  satisfy 


F(x  —  1)  <  u  <  F(x ); 


then  x  is  the  simulated  value  corresponding  to  the  observed  value  u  from  5(0, 1). 
The  procedure  is  illustrated  in  the  next  example. 


Example  6.4 

In  this  example  the  random  numbers  u\  =  0.12023,  w2  =  0.82328,  w3  =  0.54810 
from  the  sixteenth  row  of  the  table  on  page  42  of  Neave  will  be  used  to  simulate 
three  observations  from  a  binomial  distribution,  5(8, 0.3). 

To  simulate  observations,  we  must  first  calculate  values  of  the  c.d.f.;  these  are 
obtained  by  summing  values  of  the  probability  function.  The  probabilities  may  be 
taken  from  page  4  of  Neave  or  a  calculator  can  be  used. 


X 

0 

1 

2 

3 

4 

p(x) 

0.0576 

0.1977 

0.2965 

0.2541 

0.1361  ••• 

F(x) 

0.0576 

0.2553 

0.5518 

0.8059 

0.9420  •  •  • 

Since 

F{ 0)  <  0.12023  <  F(  1), 

the  first  simulated  value  of  X  is  x\  =  1.  Similarly,  since 
F(3)  <  0.82328  <  F( 4), 

the  second  simulated  value  of  X  is  x2  =  4.  The  third  simulated  value  is  rr3  =  2, 
since  F(  1)  <  0.54810  <  F(2).  □ 

Question  6.7  Use  the  numbers  in  the  bottom  row  of  the  table  on  page  42  of 
Neave  to  simulate  five  observations  from  a  Poisson  distribution  with 
mean  3.8.  □ 

That  completes  this  short  section  on  simulation.  You  will  meet  many  examples  of 
simulation  as  you  work  through  the  rest  of  the  course. 


Objectives 


After  studying  this  unit  you  should  be  able  to: 

apply  the  rules  of  probability  described  in  Section  1  to  solve  problems; 

apply  the  properties  of  probability  functions  for  discrete  random  variables,  as 
described  in  Section  2,  and  the  specific  distributions  described  in  Subsection  2.2; 

apply  the  properties  of  probability  generating  functions  as  described  in  Section  3; 

apply  the  properties  of  p.d.f.s  and  c.d.f.s  for  continuous  random  variables  as 
described  in  Section  4  and  the  specific  distributions  described  in  Section  5; 

simulate  observations  from  discrete  and  continuous  distributions; 

use  relevant  tables  in  Neave  as  indicated  in  the  text. 
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Appendix:  Solutions  to  questions 


Section  1 

1.1  (i)  There  are  six  equally  likely  outcomes,  so  the 

probability  of  a  6  is  | . 

(ii)  The  die  shows  an  even  number  if  any  of  the 
mutually  exclusive  events  2,  4  or  6  occur.  Therefore 

P(2U4U6)  =  i  +  i  +  !  =  i. 

(iii)  P(any  specific  card)=  so 
P(card  is  a  heart)  =  ||  —  1. 

(iv)  Using  Rule  (1.1), 

P(card  is  not  a  heart)  =  1  -  P(card  is  a  heart) 


(v)  There  are  12  court  cards,  so 
P(court  card)  =  gf  = 

(vi)  There  are  3  hearts  which  are  also  court  cards,  so 
P(heart  and  court  card)  = 

(vii)  There  are  13  hearts  and  another  9  court  cards 
which  are  not  hearts — 22  cards  all  together — so 

P (heart  U  court  card)  =  ||  =  11. 

(viii)  The  event  that  a  card  is  neither  a  heart  nor  a 
court  card  is  the  complement  of  the  event  in  part  (vii),  so 
its  probability  is 

1  -  P( heart  U  court  card)  =  1  -  ±1  =  jj. 


1.2  (i)  P(A)  =  1  -  P(A)  =  0.5,  by  Rule  (1.1). 

(ii)  P(B)  =  1  -  P{B)  =  0.6 

(iii)  By  Rule  (1.3), 

P{A  U  B)  =  P(A)  +  P(B)  -  P(A  n  B) 

=  0.5 +  0.4 -0.1  =  0.8. 

(iv)  P(A  n  B)  =  P(AUB)  =  1  -  P(A  U  B) 

=  1  -  0.8  =  0.2 

Here  we  have  used  the  fact  that  the  complement  of  A  U  B 
is  A(~\  B.  If  you  are  not  convinced  that  this  is  so,  then 
try  drawing  two  Venn  diagrams,  one  showing  Ini  and 
the  other  A  U  B.  The  result  should  be  clear  from  the 
diagrams. 

(v)  P(A  nl)  =  P(A)  -  P(A  n  B) 

=  0.5  -  0.1  =  0.4 

This  is  shown  in  the  Venn  diagram  below. 


1.3  (i)  By  Formula  (1.4), 

w  =  ^)  =  24  =  o.25. 

(B) 

(iii)  P(A\B)  =  =  22  =  1 

P{B)  0.6  3 

Notice  that,  given  B  occurs,  either  A  or  A  must  occur,  so 
P(A\B)  +  P(A\B)  =  1. 

This  is  an  example  of  a  general  result:  for  any  events  A 
and  E  (with  P{E)  ^  0), 

P(A\E)  +  P(A\E)  =  1. 


1-1  (i)  We  are  given  P(E)  =  0.45,  P{L)  =  0.35, 
P(EDL)  =  0.25. 

(ii)  Rearranging  Equation  (1.3), 

P(E  n  L)  =  P(E)  +  P(L)  -  P(E  U  L). 

P(E  U  L)  =  1  -  P(watches  neither) 

=  l-P(Inl)  =  0.75. 

So  P(E  fl  L)  =  0.45  +  0.35  -  0.75  =  0.05. 


(iii)  P{L\E)  = 


P(L  n  E)  P(E  n  L) 
P(E)  ~  P(E) 
0.05  _  i 
0.45  ~~  9‘ 


(iv)  P(L\E)  = 


P(L  n  E) 
P(E) 


0-25  =  5 
0.55  11 ' 


1.5  (i)  We  are  given  P(S\R)  =  0.7,  P{S\R)  =  0.55, 

P(P)  =  0.6. 

(ii)  P(RDS)  =  P(SnR)  =  P(S\R)P{R) 

=  0.7  x  0.6  =  0.42 

(iii)  P(R  n  S)  =  P(S  D  R)  =  P(S\R)P(R) 

Now 

P(S\R)  =  1  -  P{S\R)  =  1  -  0.55  =  0.45 
and 

P(R)  =  1  -  P(R)  =  1  -  0.6  =  0.4, 
so 

P(R  n  S)  =  0.45  x  0.4  =  0.18. 


1.6  (i)  We  are  given  P(F)  =  |,  P(S)  —  |  and 

P(S\F)  =  |. 

(ii)  We  require  P(F|5).  By  Bayes’  Formula, 

^  1  ;  P(S)  ~  |  ~6’ 


(iii)  The  first  probability  required  is  P(S\F). 
By  Bayes’  Formula, 


P(S\F)  =  =  (1  -  P(P|S))  P(S) 

P(P)  l-P(P) 

h-i)x|  , 

1_2  10‘ 

A  3 
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Hence  the  conditional  probability  that  Anna  gets  the 
second  problem  right  is  ^ ,  and  the  conditional 
probability  that  she  gets  it  wrong  is 

P(S\F)  =  l-P(S\F)  =  f0. 

*1.7  (i)  P(A)  =  0.1,  P(B)  =  0.05. 

Since  A  and  B  are  independent  events, 

P{A  n  B)  =  P(A)P(B )  =  0.1  x  0.05  =  0.005. 

(ii)  We  require  the  probability  that  exactly  one  defect 
occurs;  that  is,  the  probability  of  ‘A  but  not-B’  or  ‘B  but 
not-A’.  In  set  notation,  this  is 

P((A  fl  B)U  (A  nB)). 

The  two  events  A  fl  B  and  A  fl  B  are  mutually  exclusive, 
so  (by  Rule  (1.2))  this  probability  is  equal  to 

P(AnB)  +  P(AnB). 

Since  the  events  A  and  B  are  independent,  so  are  the 
events  A  and  B  and  the  events  A  and  B.  Therefore 
P(AflP)  =  P(A)P(B), 

P(AnB)  =  P(A)P(B). 

Finally,  for  any  event  E,  we  have  P(£)  =  1  —  P(B) 

(Rule  (1-1)),  so  the  required  probability  is 
P(A)(1-P(B))  +  (1-P(A))P(B) 

=  0.1  x  0.95  +  0.9  x  0.05  =  0.14. 


1.8  We  are  given  P(R\D)  =  0.96,  P(R\D)  =  0.03  and 

P(D)  =  rfo  =  °-004- 

(i)  We  require  P(£).  By  the  Theorem  of  Total 
Probability, 

P(R)  =  P(R\D)P(D)  +  P(R\D)P(D) 

=  0.96  x  0.004  +  0.03  x  0.996  =  0.03372. 


(ii)  We  require  P(D\R).  By  Bayes’  Formula, 

P(nm  P(R\D)P(D)  0.96  x  0.004 
PWR)  ~  - P(R)  “  0.03372  -  °-U39- 


1.9  We  are  given  P(G)  =  0.7,  P(N)  =  0.2,  P(U)  =  0.1, 
P(J|G)  =  0.05,  P(I\N )  =  0.95  and  P{I\U)  =  0.25. 

(i)  By  the  Theorem  of  Total  Probability, 

P(I)  =  P(/|G)P(G)  +  P(I\N)P{N)  +  P(I\U)P(U ) 

=  0.05  x  0.7  +  0.95  x  0.2  +  0.25  x  0.1  =  0.25. 


(ii)  By  Bayes’  Formula, 
P(I\G)P(G) 


P(G\I)  = 


P(I) 


0.05  x  0.7 
0.25 


0.14. 


Section  2 


2.1  (i) 


Y  —  3  when  the  die  shows  6; 

Y  —  2  when  the  die  shows  4  or  5; 

Y  =  —  1  when  the  die  shows  1,  2  or  3. 


So  the  range  of  Y  is  {—1,  2,  3}. 


(ii)  The  probability  function  of  Y  may  be  written  as 
below. 


y 

-12  3 

p{y ) 

iii 

2  3  6 

2.2  (i)  P(Y  =  4)=  ^jo.340.73  ~  0.0972 

(ii)  P(V  =  3)  =  ^  3°  ^  0.63  0.47  ~  0.0425 

(iii)  P(W  =  10)  =  ^  Jq  j  0.4210  0.584  ~  0.0193 

(iv)  P(Z  <  2)  =  P(Z  =  0)  +  P(Z  =  1)  +  P(Z  =  2) 

=  0.015625  +  0.09375  +  0.234375 
=  0.34375  ~  0.3438 

(Alternatively,  P(Z  <  2)  =  ^  +  g  =  g  =  £.) 

2.3  (i)  The  number  of  shots  required  to  hit  the 
bull’s-eye  has  a  geometric  distribution,  X  ~  Gi(0.3),  so 

P(X  =  3)  =  0.72  x  0.3  =  0.147. 

(ii)  P(X  >  5)  =  P(X  >  6)  =  0.75 
=  0.16807  ~  0.1681 

2.4  The  number  N  of  withdrawals  before  Sarah  loses 
her  card  has  a  Go  (0.8)  distribution.  We  require 

P(N  <  5).  This  can  be  calculated  directly: 

4 

P(N  <  5)  =  P(N  =  n) 

n= 0 

=  0.2 +  0.2  x  0.8 +  0.2  x  0.82 
+  0.2  x  0.83  +  0.2  x  0.84 
~  0.6723. 

Alternatively,  P(N  <  5)  =  1  -  P(N  >  5),  and  N  >  5 
implies  that  all  the  first  five  trials  are  successful.  Hence 

P(jV  <  5)  =  1  -  0.85  ~  0.6723. 

2.5  The  number  X  of  shots  required  to  hit  the 
bull’s-eye  three  times  has  a  negative  binomial  distribution 
with  parameters  3  and  0.3  (starting  at  3),  so 

P{X  =  7)  =  ^  ®  ^  0-33  0.74  ~  0.0972. 

2.6  (i)  If  X  ~  Poisson(2.5),  then 
P(X  =  0)  =  e-2-5  ~  0.0821. 

(ii)  P(X  >  3)  =  1  -  (P(X  =  0)  +  P(X  =  1) 

+  P(X  =  2)  +  P{X  =  3)) 
~  1  -  (0.0821  +  0.2052  +  0.2565  +  0.2138) 
=  1  -  0.7576  =  0.2424 

2.7  If  A  is  the  number  of  men  out  of  100  with 
colour-deficient  sight,  then  X  ~  B(100,0.06).  Since  n  is 
large  and  p  is  small,  a  Poisson  approximation  may  be 
used:  Poisson(100  x  0.06)  =  Poisson(6). 

P(X  <  3)  =  P(X  =  0)  +  P(X  =  1)  +  P(X  =  2) 

~  e~6  +  6e-6  +  18e-6 
~  0.0620. 

(Using  the  binomial  distribution,  you  should 
obtain  0.0566.) 

2.8  (i)  £(1,0.2) 

(ii)  T~Gi(0.2) 

(iii)  W  ~  B(50, 0.8) 

(iv)  V  is  negative  binomial  with  parameters  3  and  0.2 
and  range  (0, 1,2,...}. 


50 


2.9  The  expected  value  of  X  is 

P  =  E(X)  =  1  x  0.4  +  2  x  0.3  +  3  x  0.2  +  4  x  0.1  =  2. 

2.10  We  shall  use  Formula  (2.8)  for  the  variance. 
E(X2)  =  l2  x  0.4  +  22  x  0.3  +  32  x  0.2  +  42  x  0.1  =  5, 

and,  from  Solution  2.9,  p  =  E(X)  =  2.  So, 

V(X)  =  E(X2)  -  p2  =  5  -  22  =  1. 

2.11  (i)  E(X)  =  V(X)  =  px  and  E(Y)  =  V{Y)  =  p2, 
so 

E(X  +  Y)=n1+lh, 

V(X  +  Y)  =  pl+p2. 

(ii)  E(X)  =  E(Y)  =  3  and  V{X)  =  V(Y)  =  §/(§)2  =  6, 
so 

E{X  +  Y)  =  3  +  3  =  6, 

V{X  +  Y)  =  6  +  6  =  12. 


Section  3 


3.1  (i)  IT(s)  =  0.5s  +  0.4s2  +  0.1s3 

(ii)  p(l)  =  p(2)  =  •  ••  =  p(6)  =  g,  so  the  p.g.f.  is 
n(s)  =  is+is2  +  is3  +  is4-Fis5  +  Is6. 

(iii)  p( 7)  =  P(X  =  7)  =  1  and  p(x)  =  P(X  =  x)  =  0 
otherwise,  so  the  p.g.f.  of  X  is 

IT(s)  =  0s°  +  0s1  4-  •  •  •  +  Is7  +  0s8  +  •  •  •  =  s7. 

3.2  (i)  Since  II(s)  can  be  rewritten  as 
n  (s)  =  \s  +  \s2  +  ±s3, 

the  probability  function  is  given  by 
p(i)  =  p(2)  =  J,  p(3)  = 

the  probability  p(x)  is  0  otherwise. 

(ii)  Rewriting  the  p.g.f.  as 

n(s)  =  I(4  +  4s2  +  s4)  =  |  +  |s2  +  |s4, 
we  can  pick  out  the  probabilities  p(x)  =  P(X  =  x): 

P(0)  =  |,  P{  2)  =  |,  p(  4)  =  i; 
the  probability  p(x)  is  0  otherwise. 

3.3  The  probability  function  of  a  Go(p)  variate  is 

p(x)=pxq,  *  =  0,1,..., 

so  the  p.g.f.  is  given  by 

OO 

n(»)  = 

i=0 

=  <?(1  +  ps  +  (ps)2  H - ) 

_  Q 
1  —  ps 


3.4  The  probability  function  of  a  Poisson(p)  variate  is 

P(x)=^~,  *  =  0,1,.... 

So  the  p.g.f.  is  given  by 

-JU 

xl 


x=0 

OO 


(ps) 

x\ 


x 


=  e-^e^ 


3.5  (i)  The  p.g.f.  of  a  Poisson(p)  variate  is 

n(s)  =  e“'l(1"s). 

So,  if  X  ~  Poisson(4),  then 

n(s)  =  e-4(1-s). 

(ii)  The  p.g.f.  of  a  Gi(p)  variate  is 


n  (s)  =  ^ 


qs 


so  putting  p  =  4,  q  =  I,  we  obtain 


n  (s)  = 


3-2  s 


3.6  (i)  n(s)  =  i(2  +  s)4  =  (|  +  is)4,soX~B(4,i). 

(ii)  n(s)  =  so  X  ~  Poisson(|t). 

(in)  n(s)  =  5_4s  =  — _5is,  so  X  ~  G0(|). 

5 


(iv)  nw  =  =  (rfx;)  ■ 

so  X  has  a  negative  binomial  distribution  with 
parameters  r  =  3,  p  =  |  and  range  {0, 1, . . .}. 


3.7  (i)  n(s)  =  e-^x-a>,  so 
n'(s)  =  pe-^~3\ 
n"(s)  =  pV“(1-s). 

Putting  s  =  1  and  using  Formulas  (3.6)  and  (3.7),  we 
obtain 

e(> 0  =  n'(i)  =  p, 

v(x)  =  n"(i)  +  p  -  p2  =  p2  +  p  -  p2  =  p. 


(ii)  n(s)  =  — ^-,so 

n'(s)  = 

n"(s)=(l-^-, 

Putting  s  =  1,  we  obtain 

p  =  n'(l)  = 


(1  -qs)2' 


2  pq 


(i-?)2 

a2  =  fl"(l)  +  p-p2 


p_  =  1 

p2  p’ 


2pq  1  1  _  2  q  1  1 

(1  —  q)3  +  p  p2  p2  +  p  p2 

2q  +  P  —  1  _  q 
p2  ~  p2  ■ 


3.8  The  p.g.f.s  of  X  and  Y  are 

n*(s)  =  e"Ml(1~s)  and  ny(s)  =  e“'la(1“a), 
so  the  p.g.f.  of  Z  =  X  +  Y  is 
n*(s)  =  nx(s)Ilr(s) 

_  e-Ml(l-s)e-f*2(1-s) 

_  g-(Mi+M2)(l-s) 

This  is  the  p.g.f.  of  a  Poisson  distribution  with  parameter 
Pi  +  p2,  so  Z  ~  Poisson(p!  +  p2). 
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(iii)  A  sketch  of  the  p.d.f.  is  shown  below. 


3.9  The  p.g.f.  of  a  Go(p )  variate  is 

nw  = 

1  —  ps 

so  the  p.g.f.  of  Z  =  Xi  +  X2  + - \-  Xn  is 

Q  ..  1  ..  9  _  l  Q 


n  z{a)  = 


1  —  ps  1  —  ps  1  —  ps  ^  1  —  ps 

This  is  the  p.g.f.  of  a  negative  binomial  distribution  with 
range  {0, 1, . . .}  and  parameters  n  and  p. 


Section  4 


4.1  (i)  P(T<6)  =  F(6)  =  (36  -  4)/96  =  §. 

So  the  probability  that  an  oil  change  takes  not  more  than 
6  minutes  is  |. 

(ii)  If  P(T  <c)  =  0.9,  then 
(c2  -  4)/96  =  0.9, 

so  that 

c2  =  90.4, 
c  ~  9.508. 

(There  is  a  probability  of  0.9  that  the  oil  change  will  be 
completed  within  about  9|  minutes.) 

(iii)  Note  that  the  values  of  F{t)  lie  between  0  and  1, 
and  F(t)  is  non-decreasing.  The  c.d.f.  is  shown  below. 


4.3  For  w  <  2,  F(w)  =  0. 
For  2  <  w  <  4, 


F(w) 


(2  —  |u)  du 


=  [2u-y}: 

=  2 w-  \w2  -  3. 
For  w  >  4,  F(w)  =  1.  So: 

(° 

F(w)  =  <  2w  —  \w2  -  3 

u 


w  <2 
2  <  w  <  4 
w  >  4. 


4.2  (i)  Differentiating  F(t )  gives: 


F\t)  = 


0 

t_ 

48 


t  <  2,  t  >  10 
2  <  t  <  10 


The  derivative  cannot  be  calculated  at  t  =  2  or  t  —  10. 
For  convenience  we  shall  specify  /  at  t  =  2  and  t  =  10  by 
f(t)  =  i/48,  so  we  have 

/(*)  =  2  <  t  <  10. 


(ii)  P(6  <  T  <  7)  = 


13 

96' 


4.4  The  mean  of  W  is 

"4 


H  —  E(W)  =  J  w(2—\w)dw 


r. 


=  j  {2w—\vj  )dw 


2  1  314 

=  w  — 

6  J  2 


Now 


=  (16-¥)-(4-§)  =  2§. 
E(W2)  =  J  w2(2—^w)dw 


f. 


=  j  (2 w  —  \w  )  dw 


=  [!“3  -  >% 

=  (if-32)-(f  — 2)  =  7|, 
so  the  variance  of  W  is 

V(W)  =  E(W2)-p2  =  7i-(2|)2  =  |. 
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5.2 


4.5  The  mean  is 


=  6|  ~  6  minutes  53  seconds, 
and  the  variance  is 

V(T)  =  E(T2)-li\ 

where 

rio  3 

S(T’)  =  /2  ^  =  52. 

Hence,  V(T)  =  52  -  (6§)2  ~  4.543  min2. 


4.6  The  mean  lifetime  of  the  birds  is 

E(T)  =  f°(l  -  F(t))dt 

Jo 


1-(1- 


■r™ 

=  [-— 1°° 

L  t  +  iJo 

=  1  year. 


dt 


dt 


4.7  The  mean  time  for  an  oil  change  is 
E(T)  =  J  (1  -F(t))dt 

=  2  +  4f  =  6§  min, 

as  before. 


Section  5 


5.1  When  X  ~  U(a,b), 
V(X)  =  E(X2)  -  n2 


b  —  a 


dx  —  fj, 2 


b 3  -a3  _  (, a  +  b)2 
3(6  —  a)  4 

4(b3  —  a3)  —  3(6  —  a)(a  +  6)2 
12(6 -a) 


(i)  F(x)=  [  \e~xt  dt 
Jo 


=  1  -  e  Al,  x  >  0. 


(ii)  E(X) 


-/>- 

-I. 


F(x))dx ,  since  X  >  0, 


e  Ax  dx 


_  r  1  Aaiioo 

—  l--e  Jo 


X 

=  1/X. 

The  mean  of  the  exponential  distribution  is  1  /A. 


(iii)  E(X2) 


=f 


x2  \e  Xx  dx 


=  [-x2e-Ax]0°° 


2a;e  Xx  dx 


=  0  +  2  /  xe  Xx  dx 

'o 

X  —  Aitoo 


=  2([-Ixe-Ax]o°°+  /  \e~Xxdx) 


=  0  +  2[-4?e-Ax]o00 

2  .  .  .  , 

=  — j  ,  using  integration  by  parts  twice. 
A 


So 


V(X)  =  E(X2)-S=^-±;=-L. 


5.3  Men’s  calls  ~  M(|);  women’s  calls  ~  M(|); 
teenagers’  calls  ~  M(^). 

Hence,  minimum  waiting  time  ~  M(|  +  |  +  *))or 
M(0.8).  So 

P(wait  less  than  2  minutes)  =  1  —  e_0-8x2  ~  0.798. 

The  memoryless  property  has  been  used  here:  it  says 
that  the  length  of  time  the  calls  have  already  lasted  is 
irrelevant. 


5.4  Let  T\  be  the  time  until  the  person  in  the  kiosk 
leaves.  By  the  memoryless  property,  Ti  ~  M{\).  Let  T2, 
T3  be  the  lengths  of  the  calls  of  the  two  people  waiting  to 
use  the  kiosk:  Ti  ~  M(J),  i  =  2,3.  The  total  time 
W'  =  T\  +  T2  +  T3  that  you  will  have  to  wait  to  use  the 
kiosk  has  a  gamma  distribution: 

w~r(S,f). 

So 

E(W)  =  12  minutes,  V(W)  =  48  (minutes)2, 
and  the  standard  deviation  of  your  waiting  time  is 
\/V(W)  ~  6.928  minutes. 


5.5 


(i)  P(15  <  X  <  18) 

=  P(X  <  18)  -  P{X  <  15) 


fx-16  18  —  ie\ 

V  VE  <  Vs  J 


(X-  16  15-16  \ 

\  V5  ~  V5  ) 

=  P(Z  <  0.894)  -  P(Z  <  -0.447) 
=  0.8144  -  0.3275 
~  0.487. 
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(ii)  If  P(X  >  c)  =  0.1,  then 

p(z>vr)=01  and  p(z^)=°'9i 

c  —  16 

so  — =  1.2816,  from  Neave,  page  20. 

V  5 

*  Hence  c  ~  18.87. 

(Notice  that  the  2-values  obtainable  from  Neave  page  20 
are  more  accurate  than  2-values  obtained  by 
interpolation  from  the  tables  on  pages  18-19.) 

Section  6 

6.1  For  the  given  distribution,  for  x  >  0, 

F(x)  =  [  Ate-21'  dt  =  1  -  e-2*2. 

Jo 

Solving  F(x )  =  0.4287,  we  have 

e-2*2  =  0.5713, 
so  that 

2x2  ~  0.5598 
and 

x  ~  0.529  (since  x  >  0). 

The  simulated  observation  is  0.529. 

6.2  The  c.d.f.  of  T,  the  length  of  a  call,  is 

F(t)  =  1  —  e-t/5,  £>  0. 

Solving  F(t)  =  u  gives 
t  =  —5  log(l  -  ti), 

so  the  four  simulated  times  (to  the  nearest  second)  are: 

£i  =  — 51og(l  —  0.37336)  ~  2  minutes  20  seconds; 

£2  =  — 51og(l  -  0.63266)  ~  5  minutes; 

£3  =  — 51og(l  -  0.18632)  ~  1  minute  2  seconds; 

£4  =  — 51og(l  —  0.79781)  ~  8  minutes. 

In  M343  ‘log’  is  used  to  indicate  a  logarithm  to  base  e. 

You  may  be  familiar  with  the  alternatives  ‘In’  or  ‘loge\ 

6.3  The  numbers  from  the  table  must  be  multiplied 
by  5  to  give  simulated  times  in  minutes.  The  four 
simulated  times  (to  the  nearest  second)  are: 

ti  =  5  x  0.6193  minutes  ~  3  minutes  6  seconds; 
t2  =  5  x  1.8350  minutes  ~  9  minutes  11  seconds; 

£3  =  5  x  0.2285  minutes  ~  1  minute  9  seconds; 

£4  =  5  x  1.5106  minutes  ~  7  minutes  33  seconds. 


6.4  Since  p  =  4  and  a  =  y/9  =  3,  for  each  number  2 
from  the  table  we  must  calculate  x  —  3z  +  4.  The  three 
simulated  observations  from  N (4,  9)  are: 

X!  =  3  x  -0.0140  +  4  =  3.9580  ~  3.958; 
x2  =  3  x  0.3773  +  4  =  5.1319  ~  5.132; 

2:3  =  3  x  -1.0443  +  4  =  0.8671  ~  0.867. 


6.5  One  possible  scheme  is  given  in  the  table  below. 
There  are  many  other  possibilities. 


Digit 

Outcome 

1,  2,  3,  4,  5,  6 

Alan  wins  (W) 

7,  8,9 

Alan  loses  (L) 

0 

Ignore  (select  next  digit) 

Using  the  above  scheme  gives  the  following  simulation. 

Digit  6204001  81  24 

Outcome  WW-W--WLWWW 

In  the  simulation,  Alan  wins  seven  out  of  the  eight  games. 

6.6  One  possible  scheme  is  given  in  the  table  below. 

Digits  Outcome 

01, . . . ,  65  Alan  wins 
66, . . . ,  99, 00  Alan  loses 

6.7  Values  of  the  c.d.f.  of  a  Poisson  distribution  with 
mean  3.8  are  given  in  the  table  below.  These  were 
obtained  by  summing  values  of  the  probability  function 
taken  from  page  14  of  Neave. 


x 

0 

1 

2 

3 

4 

5 

p(x) 

F(x) 

0.0224 

0.0224 

0.0850 

0.1074 

0.1615 

0.2689 

0.2046 

0.4735 

0.1944 

0.6679 

0.1477  ••• 

0.8156  ••• 

Since 


F(l)  <  0.19602  <  F( 2), 
the  first  simulated  value  is  x\  =  2.  Similarly, 
X2  =  5,  X3  =  3,  X4  =4,  x5  =  3. 
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